Senior Data Engineer Resume Harrisburg, ...

Senior Data Engineer Resume Harrisburg, ...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Senior Data Engineer
Target Location	US-PA-Harrisburg
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Click here or scroll down to respond to this candidate

Candidate's Name
PHONE NUMBER AVAILABLE | EMAIL AVAILABLE | LINKEDIN LINK AVAILABLESenior Data EngineerPROFESSIONAL SUMMARY: Possesses over 8 years of advanced experience as a Data Engineer, focusing on data architecture and system optimization. Successfully led a team of data engineers or analysts, providing mentorship, coordinating project efforts, and ensuring the successful delivery of data solutions. Experience in Amazon Web Services like S3, IAM, EC2, EMR, Kinesis, VPC, Lambda, Athena, Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS and other services of the AWS family. Expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis). Experience in developing the Hadoop based applications using HDFS, MapReduce, Spark, Hive, Sqoop, HBase and Oozie. Hands on experience in architecting legacy data migration projects on - premises to AWS Cloud. Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters. Skilled in constructing and fine-tuning AWS data pipelines, architectures, and datasets. Engaged in Apache Spark, executing actions and transformations on RDDs, Data Frames, and Datasets utilizing Spark SQL and Spark Streaming contexts. Possess substantial experience in Spark Core, Spark SQL, and Spark Streaming. Adept in multiple Software Development Life Cycle (SDLC) models, including Waterfall, V-Model, and Agile methodologies. Provided training and knowledge transfer sessions to internal teams on Teradata database administration, development, and best practices, fostering a culture of self-sufficiency and continuous learning within the organization. Experienced in crafting intricate SQL queries in MYSQL for data extraction, transformation, and loading (ETL) processes, with an emphasis on enhancing query performance and efficiency. Knowledgeable in data migration and integration projects, with a proven record of minimizing downtime and maintaining data integrity during system migrations. Involved in testing data pipelines and infrastructure to ensure accuracy, reliability, and performance, utilizing testing frameworks and methodologies to maintain data quality standards.
Utilized Docker and Terraform to create reproducible and scalable cloud infrastructure, enhancing deployment processes and environment consistency. Skilled in scripting with Python and using Subversion (SVN) for version control, enhancing code management and collaborative development. Developed robust data warehousing solutions using Snowflake, optimizing data storage and retrieval processes for scalability. Designed dynamic data visualization solutions using Tableau and QlikView, facilitating insightful business reporting and decision-making. Led Teradata database migration projects, including planning, execution, and post-migration validation, ensuring seamless transition and minimal downtime. Documented Teradata database configurations, procedures, and troubleshooting guides to facilitate knowledge sharing and ensure system stability. Worked with other big data technologies like Apache Hive, Presto, Trino, and AWS Athena to query Iceberg tables, providing diverse data access methods for analytics and reporting. Stayed updated with the latest Teradata features and technologies, proactively recommending enhancements and optimizations to drive continuous improvement in data management and analytics capabilities. Engineered data pipelines using SSIS and AWS Data Pipeline for effective data migration and integration in healthcare systems. Designed and maintained scalable data architectures with AWS Redshift, ensuring efficient data storage and high performance.TECHNICAL SKILLS:Programming Languages:Python, SQL, ScalaAWS services:S3, IAM, EC2, EMR, Kinesis, VPC, Lambda, Athena, Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, SNS, SQL, ,Athena, Step Functions, SNS, SQS, Elastic MapReduce, Glue, CloudWatch, Data Pipeline, CloudFormation, Serverless Application Model, AWS Transfer Family, Data Sync, DMS, Kinesis, Event Bridge, Glue StudioDatabases:MySQL, PostgreSQL, Oracle Database, MongoDB, DynamoDB, Teradata
Data Visualization Tools:Tableau, QlikViewCloud Monitoring tools:ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, AppDynamicsDevOps Tools:Docker, Kubernetes, Git, Subversion (SVN)Other Technologies:Apache Kafka, Apache Spark, Iceberg, Apache Hadoop, SSIS, Apache NiFi, Apache Airflow, Hudi, Apache Hive, Azure Data factoryPROFESSIONAL EXPERIENCE:Client: Capco, Chicago, IL Mar 2022 - PresentRole: Sr Data Engineer Implemented a serverless architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets.
Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM. Designed and implemented IAM policies and roles to manage access control for AWS resources Configured automated backups, snapshots, and multi-AZ deployments for PL/SQL Server, enhancing data protection and disaster recovery capabilities.
Monitored and troubleshooted CI/CD pipelines and deployments using CloudWatch Logs and CodeDeploy events, ensuring successful and reliable application deployments Built data processing pipelines using AWS Glue, Pandas, and NumPy to transform raw data into structured formats for analytics in Amazon Redshift.
Used Pytest fixtures and the moto library to mock AWS services, creating isolated and repeatable tests for applications interacting with S3, RDS, and DynamoDB. Developed ETL workflows with Iceberg to transform and load data from diverse sources into Amazon Redshift, improving data processing efficiency by 50% and enabling comprehensive analytics.
Configured and optimized Elasticsearch clusters for efficient indexing, search performance, and scalability. Deployed and maintained Amazon RDS SQL Server instances, ensuring high availability, automated backups, and disaster recovery. Integrated PySpark with various data sources such as HDFS, AWS S3, Azure Blob Storage, and relational databases to create seamless data workflows. Utilized Amazon EMR for Big Data processing across a Hadoop Cluster on Amazon EC2, S3, and Redshift. Used Databricks, built on Apache Spark, to design and manage scalable data workflows, ensuring efficient data processing and analytics.
Utilized Apache Hudi's support for ACID transactions and data versioning to maintain data consistency and enable time travel queries, allowing for easy rollback and historical data analysis. Implemented CI/CD pipelines using Jenkins/GitLab CI with Docker to automate build, test, and deployment processes, enhancing development efficiency and reducing time to market. Developed pipelines for batch and streaming data ingestion, transformation, and storage using Delta Lake. Created external tables with partitions using Hive, AWS Athena and Redshift. Configured Snowflake network policies to control and secure data access, ensuring compliance with organizational security requirements. Developed automated data ingestion processes into Redshift using AWS Glue and Redshift COPY commands, ensuring timely and accurate data updates.
Designed the solution architecture on AWS, utilizing services like Amazon EMR for distributed data processing and Amazon S3 for storing raw and processed data, enabling scalability and flexibility. Utilized Pandas for data manipulation and analysis in AWS Glue ETL jobs, transforming raw data into structured formats for analytics in Redshift. Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function. Designed and implemented a real-time data processing pipeline using AWS Kinesis, AWS Glue, and Amazon Redshift, reducing data processing latency by 40%
Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Text Files into AWS Redshift. Managed Docker container orchestration and scaling using Docker Swarm, ensuring high availability and fault tolerance for critical applications Designed logical and physical data modelling for various data sources on Redshift. Integrated Lambda with DynamoDB with step functions to iterate through list of messages and updated the state into DynamoDB table.
Environment: API Gateway, Lambda, DynamoDB, S3, Certificate Manager, AWS Key Management Service, IAM, Athena, Step Functions, SNS, SQS, Elastic MapReduce, Hudi, Redshift, Glue, ELK Stack (Elasticsearch, Logstash, Kibana), RDS, CloudWatch, EC2, Iceberg, STS, Data Pipeline, CloudFormation, Docker, HiveClient: Pan-American Life Insurance Group, New Orleans, LA Feb 2020 to Jan 2022Role: Data Engineer Used various AWS services including S3, EC2, AWS Glue, Athena, RedShift, EMR, SNS, DMS, and Kinesis. Designed solutions for data ingestion patterns, utilizing AWS services like AWS Transfer Family, and Data Sync to efficiently move data across different systems and environments. Implemented AWS Glue crawlers and data catalogs to automate schema discovery and metadata management, enhancing data governance and accessibility.
Configured monitoring and alerting in Kibana using Watcher or similar tools to proactively monitor system logs, metrics, and application events.
Developed serverless functions using AWS Lambda to handle event-driven processing, automate workflows, and integrate with various AWS services, ensuring scalable and cost-effective solutions. Automated data backup and archival processes to S3 using AWS Lambda and CloudWatch Events, improving data durability and availability. Wrote Pytest test cases for validation of data ingestion and processing in AWS Kinesis, ensuring real-time data integrity and reliability. Developed ETL workflows using AWS Glue and Python, leveraging Pandas and NumPy for data transformation and loading into AWS data warehouses. Conducted root cause analysis and implemented corrective actions to minimize downtime.
Implemented pipelines for parsing, enriching, and indexing diverse data formats into Elasticsearch. Utilized Terraform for automating the provisioning of AWS cloud resources, ensuring efficient and scalable cloud infrastructure management. Configured Kinesis Data Analytics to perform real-time analytics on streaming data, providing actionable insights and enhancing decision-making processes.
Implemented Kubernetes Helm charts to package, configure, and deploy applications, simplifying the management of complex deployments and versioning
Defined policy boundaries to enforce least privilege access, maintaining security and compliance standards across your AWS environment. Integrated Hudi with Apache Spark for scalable data processing, enabling efficient batch and streaming ETL workflows using Spark Data Frames and Datasets. Extracted data from multiple source systems S3, Redshift, RDS and created multiple tables/databases in Glue catalog by creating Glue Crawlers. Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded into S3, Redshift and RDS. Configured Kubernetes Ingress controllers to manage external access to services within the cluster, improving application accessibility and load balancing Utilized orchestration services such as Lambda and Event Bridge to build workflows for data ETL pipelines, automating and orchestrating data processing tasks seamlessly. Demonstrated strong knowledge of AWS services used in monitoring, audit, and governance, ensuring compliance with organizational policies and regulations while maintaining visibility and control over your AWS environment.Environment: S3, EC2, Glue, AppDynamics, Athena, RedShift, EMR, SNS, DMS, Kinesis, Amazon DynamoDB, IAM, Glue Crawlers, Glue Studio, Hudi, AWS Transfer Family, Data Sync, AWS RDS, CloudWatch, Lambda, Event Bridge, Kubernetes, Python.Client: Advantasure, Southfield, MI. Oct 2017 to Dec 2019Role: ETL Engineer Utilized Python libraries such as Pandas, NumPy, and SciPy for data analysis, manipulation, and visualization. Processed, cleaned, and analyzed large datasets using Python's data manipulation tools, performing tasks such as data aggregation, transformation, and statistical analysis. Automated data transformation and loading processes using AWS Data Pipeline, improving throughput and reducing manual intervention.
Involved in developing spark applications using Scala, and Python for Data transformations, cleansing as well a using Spark API. Configured AWS Glue to streamline ETL operations, enabling scalable and serverless data integration for healthcare applications.
Orchestrated the implementation and optimization of Teradata database solutions to support high-volume data processing and analytics requirements. Designed and implemented Teradata data models, ensuring efficient data storage and retrieval for analytical workloads. Developed and maintained Teradata SQL scripts and queries for data extraction, transformation, and loading (ETL) processes. Collaborated with cross-functional teams to design and execute Teradata database performance tuning strategies, enhancing query efficiency and system throughput. Implemented Teradata utilities and tools for backup and recovery procedures, ensuring data integrity and disaster recovery readiness. Leveraged CloudWatch Logs to collect, store, and analyze log data generated by AWS services and applications.
Engineered data pipelines using SSIS and AWS Data Pipeline for effective data migration and integration in healthcare systems. Implemented AWS EMR for managing big data frameworks and optimizing data processing and analysis capabilities in the cloud. Utilized Subversion (SVN) for version control, maintaining code integrity, and supporting collaborative development efforts.Environment: AWS Data Pipeline, AWS Glue, S3, RDS, CloudWatch, CloudWatch Logs, AWS EMR, Python, Pandas, NumPy, SciPy, Scala, Teradata, SSIS, SVN.Client: Luxoft India Pvt. Ltd., Bangalore, India Jan 2016 to Jun 2017Role: Spark Engineer Managed large datasets using Apache Hadoop, enhancing data processing capabilities for complex data sets. Developed data architectures using file formats and tables, enhancing data organization and accessibility. Participated on project scoping exercises and creating requirement document and source to target mapping. Working with data delivery teams to setup new Hadoop users which included setting up of users and testing HDFS, Hive, Pig and MapReduce access for the new users. Participated in designing the Ingestion framework for history and incremental load into Hadoop file system. Performed complex business transformations using Spark SQL, Spark APIs and saved final dataset into hive tables. Developed ETL data pipeline using Spark API to fetch data from SQL server and third-party APIs. Migrated SQL Server packages into Spark transformations using Spark RDDs and Data frames. Worked on Data Lake Staging and Reporting layers and building the data pipeline from ingest to consumption.
Created fact and dimension tables and summary tables for reporting consumption. Developed and designed POCs using PySpark and Hive and deployed on YARN cluster, compared the performance with SQL Server modules. Improved runtime performance of Spark applications with YARN queue management and memory tuning. Used PyCharm, Spark CLI for development and managed code repository in Git and transferred the data from HDFS to database and vice-versa using Sqoop.Environment: Apache Hadoop, HDFS, Hive, Pig, MapReduce, Spark SQL, Spark APIs, SQL Server, Spark RDDs, Data frames, Data Lake, PySpark, YARN, Spark CLI, Git, Sqoop, PyCharm.Client: Quytech, Gurgaon, India Aug 2014 to Oct 2015Role: Data Analyst Developed complex SQL queries for data analysis and business reporting, enhancing decision-making processes across departments. Automated routine data processing tasks using Python scripting, increasing efficiency and reducing operational delays. Created and maintained data dashboards using Tableau, enhancing data accessibility and visualization for strategic decision-making. Configured Apache Hive for managing large datasets, enabling efficient data processing and analysis in big data environments. Managed version control and collaboration using GitLab, facilitating seamless team collaboration and code management. Utilized MySQL Server for database management, optimizing data storage and retrieval operations.Environment: MySQL, Tableau, Python, GitLab, Apache Hive.EDUCATION:Bachelor of Technology in Computer Science | JNTUH, Hyderabad, Telangana, India | 2014

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: