| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Sr. AWS Data EngineerEMAIL AVAILABLE | PHONE NUMBER AVAILABLEPROFESSIONAL SUMMARY Over 10+ years of experience as a Senior AWS Data Engineer, specializing in designing, implementing and optimizing data architectures and pipelines. Proficient in using AWS services such as Glue for data integration, Amazon RDS and DynamoDB for database management, S3 for scalable storage and Redshift for data warehousing, delivering efficient and scalable data solutions tailored to meet diverse business needs. Proficient in big data frameworks such as Apache Hadoop, Apache Spark, Apache Hive and Presto, with hands-on experience in managing HDFS and Spark SQL for large-scale data processing. Skilled in data integration and processing tools like Informatica, Talend, Apache NiFi and Kafka, facilitating seamless data flow and transformation across various systems. Experienced in using AWS services like AWS Lambda, Amazon Athena, Amazon Kinesis and AWS Data Lake Formation to build and manage serverless data architectures and real-time data processing. Strong background in data warehousing and modeling with PostgreSQL, Teradata, MySQL and NoSQL databases such as MongoDB and Cassandra ensuring effective data storage and retrieval. Advanced proficiency in business intelligence and reporting tools including Power BI, Tableau and Excel, for creating insightful and actionable data visualizations and reports. Proficient in development and management tools such as Docker, Kubernetes, Jenkins and Terraform enabling efficient deployment and orchestration of data infrastructure. Experienced in implementing data security and compliance measures using AWS IAM, AWS KMS and data governance tools ensuring data protection and regulatory adherence. Skilled in Python and Bash scripting for automating data processing tasks and utilizing tools like Redis for high-performance data caching. Experienced with version control systems like Git, GitHub and BitBucket enabling smooth collaborative development and efficient code management. Proven ability to manage and maintain data pipelines and ETL processes ensuring data integrity and availability for analysis and decision-making. Demonstrated expertise in using AWS CloudFormation and AWS CodePipeline for infrastructure as code and continuous integration/continuous deployment (CI/CD) practices. Effective communicator and collaborator with extensive experience using JIRA and Confluence for project management and team collaboration. Deep understanding of data governance and compliance practices ensuring adherence to data privacy regulations and organizational policies.TECHNICAL SKILLSCloud Platforms & Services: AWS Glue, Amazon RDS, Amazon DynamoDB, Amazon S3, Amazon Redshift, AWS Lambda, Amazon Athena, Amazon Kinesis, AWS CloudFormation, AWS CodePipeline, AWS IAM, AWS KMS, Amazon EMR, AWS QuickSight, AWS Data Lake Formation, Amazon SNS, Amazon SQSBig Data Frameworks: Apache Hadoop, Apache Spark, Apache Hive, Presto, Databricks, Hadoop Distributed File System (HDFS), Spark SQLData Integration & Processing Tools: Informatica, Talend, Apache NiFi, Redis, Snowflake, Kafka, Python, Bash, Terraform, PowerShellDevelopment & Management Tools: Docker, Kubernetes, Jenkins, Git, GitHub, BitBucket, JIRA, ConfluenceData Warehousing & Modeling: PostgreSQL, Teradata, MySQL, NoSQL (implied, e.g., MongoDB, Cassandra)Business Intelligence & Reporting: Power BI, Tableau, ExcelSecurity & Compliance: AWS IAM, AWS KMS, Data governance toolsCommunication & Collaboration: JIRA, ConfluenceWORK EXPERIENCEClient & Location: Vanguard, Malvern, Pennsylvania October 2023-PresentSr. AWS Data Engineer Developed and managed over 50 ETL pipelines using AWS Glue, reducing data processing time by 30% and enhancing data transformation efficiency. Designed and optimized data models for relational (Amazon RDS) and NoSQL (Amazon DynamoDB) databases to support business needs. Implemented data lakes on AWS, utilizing Amazon S3 for scalable data storage and management. Created and maintained data warehousing solutions in Amazon Redshift, optimizing schema design and performance tuning to achieve a 20% reduction in query execution time. Leveraged AWS Lambda for serverless ETL processes, reducing operational overhead and improving efficiency. Integrated data from various sources into Amazon Redshift and Amazon DynamoDB ensuring data consistency and availability. Designed and managed large-scale data processing and analytics pipelines with Apache Hadoop and Apache Spark, driving a 50% increase in actionable insights for decision-making. Utilized Amazon Athena to run SQL queries directly on S3 data, reducing data exploration time by 30% and enabling efficient analysis of over 10 TB of data. Configured and monitored real-time data streaming with Amazon Kinesis, supporting live data applications with 99.9% uptime and real-time processing capabilities. Developed containerized applications using Docker and orchestrated with Kubernetes for scalable data processing solutions. Implemented Infrastructure as Code (IaC) with AWS CloudFormation to automate AWS resource provisioning and management. Automated CI/CD pipelines using AWS CodePipeline for streamlined deployment and continuous integration. Utilized Apache Hive and Presto for SQL-based querying and analysis of large datasets stored in data lakes, improving data retrieval speeds by 25%. Designed and executed data integration strategies, unifying data from 15+ diverse sources and enabling comprehensive analytics across the organization. Developed Python scripts for data processing and automation, enhancing workflow efficiency by 30% and ensuring data accuracy across multiple projects. Configured and managed AWS IAM roles and policies to enforce secure access controls and data protection. Implemented encryption using AWS KMS to safeguard data in transit and at rest, adhering to security best practices. Developed and optimized business intelligence reports and dashboards using AWS QuickSight for actionable insights. Maintained high availability and performance of data processing systems by using Amazon EMR for Hadoop-based big data processing. Applied Agile methodologies and Scrum practices, leading to a 20% increase in project delivery speed and improved team collaboration across cross-functional teams. Monitored and managed data workflows with Apache Spark, optimizing job scheduling and resource management, which resulted in a 30% reduction in processing times. Developed data models for efficient querying and reporting, improving data accessibility and user experience. Designed and implemented notification and message queuing systems using Amazon SNS and SQS for event-driven architecture. Conducted data quality assessments and improvements to ensure accuracy and reliability of data in data lakes and warehouses. Developed real-time data processing solutions with Amazon Kinesis for timely insights and decision-making. Created and managed Docker containers and Kubernetes clusters, enhancing scalability and resilience by 40% for data engineering solutions. Automated infrastructure provisioning and deployment with AWS CloudFormation, improving operational efficiency by 45%. Built and maintained comprehensive data integration frameworks, connecting 20+ disparate data sources and enabling seamless data flow across the organization. Configured and used AWS services to support data-driven applications including Amazon RDS and Amazon DynamoDB for various use cases. Worked closely with cross-functional teams using JIRA to monitor project improvement and ensure alignment with business goals.Tech Stack: AWS Glue, Amazon RDS, Amazon DynamoDB, Amazon S3, Amazon Redshift, AWS Lambda, Apache Hadoop, Apache Spark, Amazon Athena, Amazon Kinesis, Docker, Kubernetes, AWS CloudFormation, AWS CodePipeline, Apache Hive, Presto, Python, AWS IAM, AWS KMS, AWS QuickSight, Amazon EMR, Agile, Scrum, Amazon SNS, Amazon SQS, JIRAClient & Location: JPMorganChase, Wilmington, Delaware April 2022-September 2023Sr. Data Engineer Developed and optimized complex SQL queries for advanced data manipulation, query optimization and performance tuning ensuring efficient data operations. Designed and implemented intricate data models, including star schemas, supporting data warehousing and analytical processes, which enhanced data retrieval efficiency by 30%. Built and maintained robust ELT pipelines using Informatica, reducing data extraction, transformation, and loading time by 25% and streamlining data workflows across the organization. Managed data warehousing concepts such as partitioning indexing and lifecycle management to enhance data storage and retrieval efficiency. Utilized Amazon S3 for advanced data storage solutions including lifecycle policies and cross-region replication for data redundancy and availability. Leveraged AWS Lambda for scalable, event-driven data processing, automating workflows and enhancing system responsiveness by 50%. Administered Amazon RDS and Aurora databases, focusing on backup strategies and performance optimization, resulting in a 20% increase in database reliability and management efficiency. Implemented AWS Glue for data cataloging, ETL processes and job management facilitating seamless data integration and transformation for 10+ business units. Managed Amazon Redshift clusters, performing performance tuning and optimization to support large-scale data analytics. Executed ad-hoc querying over datasets exceeding 10 TB using Amazon Athena providing real-time insights and supporting data-driven decision-making across multiple teams. Applied Amazon EMR for big data processing with Hadoop, Spark, and Hive enabling large-scale data analysis and management, which improved processing speeds by 40%. Configured and managed AWS Data Lake Formation establishing and overseeing data lakes, ensuring structured and secure data storage for over 5 PB of data. Orchestrated data pipeline workflows using AWS Step Functions improving the efficiency and reliability of data processing tasks. Developed advanced Python scripts for data manipulation and automation utilizing AWS SDKs (Boto3) for streamlined cloud operations. Created data processing applications in Scala with Apache Spark enhancing real-time data analytics capabilities. Managed real-time data streaming and Kafka clusters ensuring effective data ingestion and processing for dynamic data environments. Implemented CI/CD pipelines using Jenkins, automating deployment processes and maintaining continuous integration and delivery reducing deployment time by 40%. Utilized Terraform for Infrastructure as Code (IaC), automating and managing cloud infrastructure provisioning and configuration which improved deployment efficiency by 45%. Developed and maintained ELK stack solutions for comprehensive data logging, analysis, and visualization, improving monitoring and troubleshooting processes by 30%. Configured AWS IAM roles, policies and permissions to ensure secure and compliant access to data and cloud resources. Developed interactive dashboards and reports with Power BI, delivering actionable insights and enhancing business intelligence efforts. Collaborated in Agile environments using JIRA and Confluence, following Scrum methodologies, resulting in a 20% increase in the quality and delivery speed of data engineering solutions. Worked cross-functionally with data scientists, software engineers and business stakeholders to align data engineering projects with organizational goals. Ensured data integrity and accuracy through rigorous testing and validation processes supporting reliable data-driven decision-making. Optimized and maintained data pipelines and data processing workflows, focusing on efficiency, scalability and performance.Tech Stack: SQL, Informatica, Amazon S3, AWS Lambda, Amazon RDS, Aurora, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Hadoop, Spark, Hive, AWS Data Lake Formation, AWS Step Functions, Python, AWS SDK (Boto3), Scala, Apache Spark, Kafka, Jenkins, Terraform, ELK stack, AWS IAM, Power BI, Tableau, JIRA, Confluence.Client & Location: State Farm, Bloomington, Illinois November 2019-March 2022
Data Engineer Employed Python for scripting and automating data tasks, enhancing data manipulation and process efficiency. Configured and managed networking in AWS including VPC, subnets and security groups, to maintain secure and efficient cloud environments. Automated data tasks and reporting processes using Python, enhancing data accuracy and reducing manual effort. Designed and executed complex ETL processes to integrate and process data from diverse sources into target systems. Managed and maintained data solutions using Snowflake ensuring high performance and availability of data storage. Utilized JIRA for tracking project tasks, managing issues and ensuring timely delivery of data engineering solutions. Developed and maintained data pipelines for continuous data integration and delivery ensuring data quality and consistency. Leveraged Informatica for building and managing ETL workflows ensuring accurate and timely data processing. Implemented and maintained secure and compliant data solutions, adhering to best practices in data governance and security. Utilized AWS Cloud Computing services including Amazon S3, AWS Lambda and Amazon RDS, to support cloud-based data solutions. Integrated Apache Spark for large-scale data processing enabling advanced analytics and data transformation. Applied data modeling concepts to design and implement efficient schemas for optimized data storage and retrieval. Collaborated with Agile teams to deliver data engineering solutions in iterative and incremental phases. Optimized SQL queries and ETL processes to enhance data processing efficiency and boost overall system performance. Automated data workflows and pipelines to streamline data integration and processing, improving overall efficiency. Participated in Scrum ceremonies to drive project progress including daily stand-ups, sprint planning and retrospectives. Utilized Git for version control ensuring effective collaboration and code management in data engineering projects. Designed and implemented data models including normalized and denormalized schemas, to support robust and scalable data architectures. Monitored and optimized data workflows and pipelines to ensure high performance and reliability. Utilized ETL tools and automated data pipelines to streamline data integration and processing workflows.Tech Stack: Python, AWS ( S3, ES2, Lambda, RDS), Snowflake, JIRA, Informatica, Apache Spark, data modeling, Agile, Scrum, SQL, GitClient & Location: Target Corporation, Minneapolis, Minnesota June 2017-October 2019Junior Data Engineer Utilized PostgreSQL for relational database management and data analysis ensuring effective data storage and retrieval. Developed Python scripts for automation and data manipulation, leveraging libraries such as Pandas and NumPy to streamline data processing tasks. Implemented Bash scripting on Unix/Linux systems to automate routine tasks and improve operational efficiency. Applied data warehousing concepts including star schema and ETL processes to design and implement effective data models. Leveraged Apache Spark for large-scale data processing and analytics implementing Spark SQL for querying and transforming data. Integrated Redis for caching and optimizing data access, enhancing system performance and response times. Utilized AWS services such as S3 for storage, Lambda for serverless computing, RDS for database management and EMR for big data processing. Maintained and versioned code using Git ensuring effective collaboration and management of code repositories. Worked with Teradata for large-scale data storage and analytical processing supporting data-driven decision-making. Designed and executed data modeling techniques to support efficient data organization and retrieval, improving analytical capabilities. Configured and managed Apache NiFi for data flow automation enabling smooth data ingestion and processing workflows. Developed ETL pipelines to extract, transform and load data from various sources into data warehouses for analysis. Worked with team members to define data requirements and create solutions that align with both business goals and technical specifications. Utilized SQL for querying, joining and manipulating databases, generating insights and supporting business needs. Employed Apache Hive for data warehousing and SQL-like querying on Hadoop facilitating complex data analysis. Implemented data integration concepts and tools to consolidate and manage diverse data sources including data lakes and data marts.Tech Stack: PostgreSQL, Python (Pandas, NumPy), Bash, Apache Spark, Spark SQL, Redis, AWS (S3, Lambda, RDS, EMR), Git, Teradata, Apache NiFi, SQL, Apache Hive.Client & Location: Agarwal Foundaries Pvt Ltd, Andra Pradesh June 2013-November 2015Data Analyst Utilized SQL and MySQL to query and analyze complex datasets ensuring accurate and efficient data retrieval for business insights. Developed and executed Python scripts for data manipulation, transformation and cleaning including handling missing data and implementing data transformation processes. Applied statistical methods such as regression analysis, hypothesis testing and descriptive statistics to derive actionable insights and support data-driven decision-making. Created and maintained PivotTables and performed advanced data analysis using Excel functionalities including VLOOKUP, macros and the Data Analysis Toolpak. Designed and built interactive dashboards and visualizations in Tableau to effectively communicate data trends and findings to stakeholders. Managed ETL processes using tools like Talend ensuring efficient extraction transformation and loading of data across various systems. Utilized Hadoop for big data processing employing Hadoop Distributed File System (HDFS) to handle and analyze large volumes of data effectively. Implemented data cleaning strategies to enhance data quality and consistency addressing issues such as missing or erroneous data. Partnered with cross-functional teams to establish data requirements and provide insights using data analysis and reporting tools. Utilized AWS services for data storage and processing integrating cloud-based solutions to support scalable data analysis and management. Tracked and managed project tasks and progress using JIRA ensuring timely completion of data analysis projects and adherence to project deadlines. Maintained version control and code management with Git, facilitating collaborative development and tracking of changes in data analysis scripts and projects.Tech Stack: SQL, MySQL, Python, Excel, Tableau, Talend, Hadoop, HDFS, AWS, JIRA, Git.
|