Data Engineer Senior Resume Plano, TX

Data Engineer Senior Resume Plano, TX
Resumes | Register
Candidate Information
Title	Data Engineer Senior
Target Location	US-TX-Plano
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Senior Plano, TX
Data Engineer Senior Denton, TX
Senior Data Engineer Irving, TX
Data Engineer Senior Irving, TX
Senior Data Engineer Frisco, TX
Data Engineer Senior Dallas, TX
Principal Machine Learning Engineer | Senior Data Scientist Dallas, TX
Click here or scroll down to respond to this candidate
SESH PSenior Data EngineerContact No: PHONE NUMBER AVAILABLEE-Mail: EMAIL AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLEProfessional Summary:8+ years of professional IT experience, including 5+ years of designing and developing Data Lake Solutions using Big Data Stack and multiple programming languages (Python, Java, R, Scala).Proven track record in utilizing cloud platforms such as Azure and AWS, with proficiency in creating scalable data storage solutions, optimizing ETL workflows, and implementing CI/CD pipelines for efficient deployment.Experienced in implementing and optimizing CI/CD pipelines using Jenkins, GitLab CI/CD, ensuring seamless integration and deployment of software applications.Proficient in Infrastructure as Code (IaC) tools such as Terraform and AWS CloudFormation, as well as Configuration Management tools like Ansible, Chef, and Puppet, enabling efficient automation and management of infrastructure resources.Extensive experience in implementing and optimizing data processing applications using Apache Spark, Hadoop, and Databricks, with a focus on large-scale data transformations and analytics.Skilled in data science concepts including data mining, predictive modeling, statistical analysis, and machine learning algorithms, enabling effective extraction of insights and development of solutions for diverse business challenges.Skilled in working with various data warehousing technologies, including Snowflake, Redshift, and Teradata, ensuring optimal performance and scalability for analytical processing.Skilled in leveraging relational databases (SQL) such as MySQL, PostgreSQL, and SQL Server for managing structured data efficiently.Experienced in ETL processes leveraging industry-standard tools such as Talend, Informatica, IBM DataStage, and Matillion to efficiently extract, transform, and load data across diverse systems and platforms.Skilled in implementing real-time data processing solutions using Apache Kafka Streams, Apache Flink, and Apache Storm, ensuring high-throughput and low-latency data processing.Proficient in cloud-based streaming platforms like AWS Kinesis and Azure Stream Analytics, enabling seamless ingestion, processing, and analysis of data streams at scale.Skilled in handling unstructured data sources such as JSON, XML, and text files, leveraging tools like Python and Bash scripting for parsing, processing, and transforming data.Experienced in utilizing NoSQL databases such as Azure Cosmos DB, MongoDB, Cassandra, and Amazon DynamoDB for storing and managing unstructured data efficiently.Proficient in utilizing data serialization formats like Avro, Parquet, and ORC for efficient storage and processing of semi-structured and unstructured data.Experienced in end-to-end API development as a Data Engineer, proficient in designing, implementing, and maintaining robust APIs for data access, ingestion, and integration purposes.Skilled in leveraging RESTful principles and modern API frameworks to build scalable and efficient data exchange interfaces, ensuring seamless connectivity and interoperability across diverse systems and applications.Proficient in designing and developing data models, ETL processes, and data pipelines using tools like Informatica, Apache Nifi, and DataStage, ensuring accurate and reliable data flow.Expertise in utilizing data visualization tools like Tableau and Power BI to create insightful reports and dashboards, providing actionable insights for stakeholders.Skilled in Elasticsearch and Apache Solr for building powerful search and analytics applications, leveraging features such as full-text search, real-time indexing, and aggregations for rapid data retrieval and insights generation.Experienced in orchestrating seamless migrations from legacy on-premises data integration solutions such as SQL Server Integration Services (SSIS) to modern cloud-based data platforms like Snowflake.Proficient in implementing data governance workflows and processes for data discovery, classification, and lineage tracking using tools like Alation and CollibraSkilled in leveraging the capabilities of Azure DevOps for building Docker images, implementing CI/CD pipelines, and ensuring streamlined and automated deployment processes.Experienced in orchestrating complex data workflows and pipelines using advanced orchestration tools such as Apache Airflow and Prefect, adept at designing, scheduling, and monitoring data processing tasks with precision and reliability.Skilled in configuring and maintaining Jira projects, workflows, and boards, ensuring efficient project planning, and tracking within an agile development environment.TECHNICAL SKILLS:LanguagesPython, Java, Scala, JavaScript (Node.js)Data Processing & ETLAzure Databricks, Spark, SQL, Snowflake, ETL Informatica, PySpark, Python, Java, Apache Nifi, Airflow, InformaticaMachine LearningAzure ML, Spark MLlibData StorageAzure Data Lake Storage (ADLS), Azure Blob Storage, Snowflake, MongoDB, DynamoDB, HDFS, SQL Server, Redshift, Cosmos DBData WarehousingSnowflake, Teradata, RedshiftData VisualizationTableau, Power BIVersion ControlGitCI/CDAzure DevOps, JenkinsContainerizationDocker, KubernetesStreamingKafka, Spark Streaming, KSQLBig DataHadoop, EMR Cluster, Hive, Pig, Sqoop, Cassandra, Apache AirflowDatabaseSQL Server, MongoDB, Cosmos DB, Teradata, CassandraCloud ServicesAzure Cloud, AWS (Lambda, Glue, EMR, S3, EC2).Code ManagementGit, Bitbucket, GitHubIssue TrackingJiraWORK EXPERIENCE:Client: City of Hope, Jersey City, NJ Feb 2023Till dateRole: Data EngineerResponsibilities:Worked with Azure Databricks to ensure scalable and distributed data processing using Apache Spark, designing, and implementing Spark jobs for large-scale data transformations and analytics and utilized Azure Databricks notebooks for interactive data exploration, analysis, and collaborative model development.Integrated Alation with other data management tools and technologies, such as data warehouses and ETL pipelines.Led end-to-end ETL processes for seamless data integration and implemented governance measures for compliance by Analyzing data for patterns and anomalies, improving ETL processes and designed, implemented efficient ETL using Informatica PowerCenter.Developed and implemented robust data pipelines using Python for efficient data processing and integration, including the creation of data acquisition scripts to ensure data integrity and reliability in healthcare data integration.Utilized PySpark for large-scale data processing, troubleshooted Spark applications for enhanced error tolerance, and fine-tuned Spark applications for improved efficiency in Databricks.Developed and implemented data pipelines leveraging Azure Data Factory (ADF) and Azure Databricks to preprocess and transform large-scale datasets for training and inference tasks related to RAG (Retrieval-Augmented Generation) architecture and Large Language Models (LLMs).Capable of integrating ADF with other Azure services such as Azure SQL Database, Azure Blob Storage, and Azure Synapse Analytics for end-to-end data solutions.Extensive hands-on experience in designing, developing, and deploying ETL (Extract, Transform, Load) solutions using SQL Server Integration Services (SSIS).Led the integration of Meditech electronic health record (EHR) data with Epic's Caboodle data warehouse, establishing seamless data flow and interoperability between disparate healthcare systems to facilitate comprehensive patient care coordination.Implemented data governance policies and procedures for managing and securing sensitive healthcare data stored in Epic's Caboodle data warehouse, ensuring compliance with industry regulations such as HIPAA and GDPR.Implemented Kafka and Confluent Kafka for real-time data streaming, utilizing KSQL for data analysis.Implemented Fivetran data pipelines to automate the extraction and integration of healthcare data from Electronic Health Record (EHR) systems, Practice Management (PM) systems, and other healthcare applications into a centralized data warehouse.Leveraged Snowflake features to optimize warehouse performance and storage efficiency and designed and implemented Snowflake data warehouses for optimal analytical processing performance and scalability.Integrated Cypress with continuous integration/continuous deployment (CI/CD) pipelines to automate the execution of end-to-end tests in a variety of test environments.Developed interactive and insightful Tableau dashboards and visualizations based on business requirements.Utilized Azure DevOps for automated building and deployment of Docker images, playing a key role in successful CI/CD pipeline implementation, and crucially contributing to its development.Applied Kubernetes for orchestrating containerized applications, enhancing scalability and reliability in data processing.Designed, developed, and maintained data pipelines using DBT, Implemented and maintained DBT macros to automate repetitive data transformation tasks and Utilized DBT for data transformations and data models to ensure data consistency and accuracy.Environment: Azure Databricks, Spark, SQL, Scala, Akka, Kafka, Snowflake, ETL Informatica, Tableau, PySpark, Python, Alation, DBT, Azure Devops, Azure services (Azure ML, ADLS, ADF, ADS), Cloudera, Fivetran, Jenkins, K-SQL, Kubernetes, Microservices, Prefect.Client: Lending Club, Jersey City, NJ Jan2021Feb2023Role: Data EngineerResponsibilities:Worked with AWS Lambda and other AWS services to build serverless applications that leveraged DynamoDB.Designed and developed ETL processes in AWS Glue to migrate campaign data from external sources to Redshift.Experienced with EMR cluster and S3 in AWS cloud, EC2, and AWS services like EMR, Redshift, and EC2 for data processing.Used Python to extract data from Snowflake and upload it to Salesforce and wrote an AWS Lambda service for real-time data to One-Lake.Reviewed code components of developed packages and developed ETLs using PySpark's Data frame and Spark SQL API.Expertise in creating, debugging, scheduling, and monitoring Airflow jobs for ETL batch processing to load into Snowflake.Utilized Spark's in-memory capabilities to handle large datasets and applied broadcast variables, efficient joins, and transformations.Designed Spark JDBC jobs to transfer user profile data from RDBMS to S3 Data Lake.Implemented RESTful APIs and web services using Scala Play framework, adhering to industry standards and best practices to ensure interoperability and ease of integration with external systems.Skilled in performing data transformations using SSIS tasks, including data cleansing, aggregation, and enrichment.Experienced in implementing error handling and logging mechanisms within SSIS packages to ensure data integrity and traceability.Knowledgeable about deploying and scheduling SSIS packages using SQL Server Agent and other scheduling tools.Knowledgeable about Kinesis Data Analytics SQL for performing real-time analytics and processing on streaming data using standard SQL queries.Designed and implemented NoSQL databases such as Cassandra to store large amounts of unstructured and semi-structured data.Configured and fine-tuned PostgreSQL database parameters and settings to optimize resource utilization and improve overall system performance in high-volume transactional environments.Developed ETL processes using Snowflake's features for efficient data loading and transformation.Created DataStage jobs with multiple stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, etc.Implemented and maintained security best practices within Terraform, such as encryption and access controls, to ensure the security of sensitive data.Developed robust error handling and exception management strategies within the migration platform, incorporating automated alerts and notifications to promptly notify stakeholders of critical issues.Implemented DynamoDB data indexing to optimize query performance and managed, monitored DynamoDB performance and capacity to ensure optimal performance and availability.Leveraging Cypress's robust APIs and plugins, facilitated seamless integration with CI/CD tools such as Jenkins or GitLab CI, enabling rapid feedback loops and early detection of regressions in the software development lifecycle.Environment: Databricks, Spark, Apache Nifi, Spark-SQL, Scala, Kinesis, Snowflake, Teradata Warehouse, DataStage, PySpark, AWS Glue, PostgreSQL, Scala, Python, Alteryx, Airflow, Jenkins.Client: State Farm, Bloomington, IL Feb 2019  Dec 2020Role: Data EngineerResponsibilities:Utilized Azure services (ADLS, ADF, HDInsight, Databricks, Synapse SQL) for data applications.Developed ADF pipelines to extract, transform, and load data from various sources (Azure SQL, Blob storage, Azure SQL Data warehouse).Migrated repositories from different source control systems to TFVC and GI in Azure DevOps.Created CI/CD pipelines in Azure DevOps and worked with automation tools such as GIT, Terraform, Ansible.Implemented large Lambda architectures using Azure Data platform (Data Lake, Data Factory, HDInsight, Azure SQL Server).Transferred and transformed large volumes of structured and semi-structured data using Spark and Hive ETL pipelines.Architected, developed, and deployed end-to-end ETL pipelines using Talend to ingest, transform, and load large volumes of data from disparate sources into target systems.Utilized Talend's graphical interface to design complex ETL workflows, incorporating data cleansing, enrichment, and aggregation processes to ensure data quality and consistency.Optimized Hive queries using Hadoop, YARN, Python, PySpark and best practices.Developed Spark with Scala and Spark-SQL for testing and processing data.Built Spark applications and automated pipelines for bulk and incremental loads of datasets.Written complex queries and data aggregation pipelines to extract insights from NoSQL databases.Developed complex SQL queries and stored procedures in PostgreSQL to support various data analysis and reporting requirements, improving data accessibility and insights generation.Implemented PostgreSQL replication solutions, including streaming replication and logical replication, to ensure high availability and disaster recovery capabilities for critical data assets.Skilled in writing Flink jobs using the Flink APIs (DataStream API and DataSet API) and Flink SQL for declarative query processing.Created JSON scripts for deploying ADF pipelines using Cosmos Activity.Developed interactive and visually compelling dashboards and reports using QlikView/Qlik Sense's drag-and-drop interface, providing actionable insights to stakeholders across the organization.Environment: SQL Server, S3, Hadoop, Apache Nifi, Apache Flink, IBM InfoSphere, YARN, Python, Scala, PySpark, Azure DevOps, Terraform, Ansible, Azure services (ADLS, ADF, HDInsight, Databricks, Synapse SQL), Lambda architectures, QlikView/Qlik Sense, Apache Spark, Spark-SQL, PostgreSQL.Client: Data Bien Analytics, India Feb 2018  Jan2019Role: Data EngineerResponsibilities:Designed and developed data engineering solutions using Spark, Scala, Azure Data Factory, and Databricks for data cleansing, validation, transformation, and summarization tasks.Utilized Azure Data Factory as a primary tool for data ingestion and orchestration from multiple sources.Automated data processing jobs with different triggers in Azure Data Factory.Used Cosmos DB to store catalog data and implemented user-defined functions, stored procedures, and triggers.Analysed data flow to provide a comprehensive design architecture in Azure.Monitored ETL jobs, optimize performance, and troubleshoot issues to ensure timely and reliable data processing.Collaborated with data modelers and database administrators to ensure Informatica processes align with data structures.Implemented in-memory data computation with Spark RDD to meet business requirements.Performed data transformations and manipulations using Azure Data Factory, PySpark, and Databricks.Implemented Spark Streaming for mini-batch data processing and real-time data processing using Azure StreamDeveloped complex SQL queries and PL/SQL stored procedures to extract, transform, and analyse data from Oracle databases, meeting diverse business requirements.Implemented Oracle Data Guard for database replication and disaster recovery, ensuring data availability and minimizing downtime in case of system failures.Stay informed about the latest features and best practices in Snowflake for continuous improvement.Conducted training sessions to empower users in creating and interpreting reports through Power BI.Collaborated with cross-functional teams to troubleshoot production issues related to DataStage jobs.Developed and implemented microservices architecture for efficient data processing and scalability.Orchestrated and managed containerized applications using Kubernetes for seamless deployment and scaling.Implemented Kubernetes clusters to enhance system reliability and facilitate easy scaling of data processing tasks.Containerized data processing applications using Docker for simplified deployment and isolation.Integrated Azure DevOps tools for version control, enabling efficient collaboration and code management.Set up centralized logging using ELK (Elasticsearch, Logstash, Kibana) for real-time analysis and debugging.Environment: Azure Cloud, Azure Data Factory, Azure Databricks, Azure functions Apps, Azure DataLake, BLOBStorage, Teradata Utilities, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Oracle, Data bricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Stream Analytics, Azure EventHub, Azure Machine Learning.Client: Senvion, India Dec2016 -Jan2018Role: Big Data DeveloperResponsibilities:Designed and implement scalable and secure data storage solutions on AWS, utilizing services such as Amazon S3 and Amazon RDS. Implemented Agile software methodology for project development.Developed and optimize ETL processes using AWS Glue for efficient data extraction, transformation, and loading.Implement data pipelines and workflows on AWS, leveraging services like AWS Data Pipeline or Apache Airflow on Amazon EC2.Utilized AWS Lambda for serverless computing, automating data processing tasks and ensuring cost-effective solutions.Collaborated with data modelers and database administrators to define and maintain data structures suitable for ETL processes.Developed and maintained documentation for Informatica workflows, mappings, and transformation logic.Implemented data analysis and manipulation using Python libraries such as Pandas and NumPy.Developed custom data validation and transformation logic using Java libraries and frameworks to enforce data quality standards, identify data anomalies, and cleanse or enrich data as needed.Converted CSV files to JSON using a CSV reader and integrated with SQL Database to create tasks.Used MongoDB, a NoSQL database, to store processed data and schedule tasks using Click Software.Optimized Snowflake SQL queries and fine-tune performance for large-scale data processing.Implemented data modeling and relationships in Power BI for accurate and meaningful visualizations.Leveraged Ab Initio's graphical development environment to design, develop, and deploy end-to-end ETL workflows for processing large volumes of data from diverse sources.Designed and implemented end-to-end data integration and ETL (Extract, Transform, Load) pipelines using Oracle Database technologies such as Oracle Data Integrator (ODI), Oracle GoldenGate, and Oracle Warehouse Builder (OWB).Utilized Spring framework, including Spring Dependency Injection, Spring Web Flow with Spring MVC and Spring Boot in IntelliJ IDEA for application development.Developed a REST API to process data between databases and Scheduled Outlook calendar using Quartz and CRON triggers.Imported RDBMS data into HDFS using Sqoop. Also Implemented Log4J2 for logging and Developed a REST API in Node.js with Express service.Created a Kafka messaging system with APIs as producers and consumers using Kafka Wrapper.Used Zookeeper to set offsets for APIs and prevent message loss during transmission from one API to another in the system.Environment: Spring framework, Hibernate, REST API, MongoDB, Docker, Jenkins, Core Java, Pig and Hive scripts, Oozie, Flume, Sqoop, Spark machine learning and streaming, Node JS, Kafka, Zookeeper.Client: Keste Software, India Aug2015- Dec2016Role: Big Data DeveloperResponsibilities:Designed and maintain AWS Redshift data warehouses for high-performance analytics and reporting.Implemented data security measures on AWS, including encryption at rest and in transit, IAM policies, and VPC configurations.Leveraged AWS EMR for processing large-scale data using frameworks like Apache Spark and Apache Hadoop.Developed and implement ETL processes to efficiently extract, transform, and load data from diverse sources into target systems.Designed and optimized data workflows to ensure the accuracy, integrity, and quality of data throughout the ETL lifecycle.Utilized Informatica tools for metadata management, ensuring comprehensive documentation of data lineage.Developed and maintain Python scripts for data extraction, transformation, and loading (ETL) processes.Designed and developed scalable and distributed data processing applications using Java and frameworks such as Apache Hadoop, Apache Spark, or Apache Flink to handle large volumes of structured and unstructured data.Implemented data ingestion, transformation, and analysis workflows, leveraging Java's multi-threading capabilities and distributed computing frameworks to process data in parallel and achieve high throughput.Used technologies such as Hadoop, Apache Spark, Spark Streaming, Spark SQL, HBase, and Scala.Used Spark Streaming API for segregating and categorizing data as per requirement, stored in HBase.Created Hive external tables linked to data in HDFS and accessed them using Spark SQL.Installed and set up a multi-node Apache Hadoop and Apache Spark cluster for data storage and processing,Analyzed data stored in MS SQL Server 2016 Management Studio using SQL queries.Implemented and maintain security measures, including role-based access control, encryption, and data masking in Snowflake.Maintained and update Power BI datasets, reports, and dashboards to reflect changing business needs.Created and manage Git repositories for different projects, ensuring proper access controls.Ensured accurate and up-to-date documentation of issues, epics, and user stories in Jira.Importing structured data into HDFS via Sqoop and implemented permanent data storage using Cassandra.Implemented server-side REST services using NodeJS and integrated with Cassandra for web applications.Environment: Apache Hadoop, Apache Spark, Spark Streaming, Spark SQL, HBase, Cassandra, Hortonworks Data Platform, Scala, NodeJS.Personal Attributes:I possess excellent communication skills and can effectively convey information through various mediums.I am highly adaptable and willing to put in extra hours to ensure job success.I approach challenges with determination and a strong work ethic.I have a strong ability to diagnose and resolve issues efficiently.I am constantly seeking opportunities to expand my knowledge and stay up to date with new technologies and tools.Educational Details: -Bachelors in ECE from Jawaharlal Nehru Technological University-Kakinada, India - 2015Certification: - DVA-C02 AWS Certified Developer - Associate
Respond to this candidate
Your Message
Please type the code shown in the image: