Senior Data Engineer Resume Alpharetta, ...

Senior Data Engineer Resume Alpharetta, ...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	senior data engineer
Target Location	US-GA-Alpharetta
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Senior Data Engineer Cumming, GA

Senior Cloud Data Engineer Atlanta, GA

Senior Big Data Engineer Atlanta, GA

Senior Data Engineer Atlanta, GA

Senior Software Engineer Suwanee, GA

Data Center Sqa Engineer Atlanta, GA

Software Engineer Data Entry Atlanta, GA

Click here or scroll down to respond to this candidate

Candidate's Name
Senior Data Engineer PHONE NUMBER AVAILABLE EMAIL AVAILABLE LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY Proficient Data Engineer with over 10 years of experience in the IT industry, hands-on experience in installing, configuring, and using Hadoop ecosystem components such as Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka, and Spark. Agile methodologies include extreme programming, SCRUM, and Test-Driven Development (TDD). Extensive Experience in importing and exporting data using stream processing platforms like Flume. Experience in developing MapReduce Programs using Apache Hadoop for analyzing big data as per the requirement. Strong experience in migrating other databases to Snowflake. Played a key role in Migrating Teradata objects into the Snowflake environment. Excellent Knowledge of Cloud data warehouse systems AWS Redshift, S3 Buckets, and Snowflake. Worked with HBase to conduct quick look-ups (updates, inserts, and deletes) in Hadoop. Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows Excellent knowledge of Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm. Experience tuning spark jobs for efficiency in terms of storage and processing. Experienced in integrating Hadoop with Kafka, experienced in uploading Clickstream data to HDFS. Experienced in loading datasets into Hive for ETL (Extract, Transfer, and Load) operation. Experience in developing big data applications and services using in Amazon Web Services (AWS) platform using EMR, S3, EC2, Lambda, CloudWatch, and cloud computing using AWS RedShift. Experience in analysing data using HQL, Pig Latin, and custom MapReduce programs in Python. Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data Lake Store, Azure Data Bricks, and Azure Data Factory, and created POC in moving the data from flat files and SQL Server using U-SQL jobs. Involved in writing data transformations, and data cleansing using PIG operations and good experience in data retrieving and processing using HIVE. Experience in creating Spark Streaming jobs to process huge sets of data in real time. Created reports using visualizations such as Bar charts, Clustered Column charts, Waterfall charts, Gauges, Pie charts, Tree maps, etc. in Power BI. Expertise in relational database systems (RDBMS) such as My SQL, Oracle, MS SQL, and No SQL database systems like HBase, MongoDB, and Cassandra. Experience with Software development tools such as JIRA, GIT, and SVN. Having experience in developing storytelling dashboards data analytics, designing reports with visualization solutions using Tableau Desktop, and publishing onto the Tableau Server. Flexible working Operating Systems like Unix/Linux (Centos, Redhat, Ubuntu) and Windows Environments. Extensive Experience in designing, developing, and deploying various kinds of reports using SSRS using relational and multidimensional data. Developed Apache Spark jobs using Scala and Python for faster data processing and used Spark Core and Spark SQL libraries for querying. Proficient in using Hive optimization techniques like Buckets, Partitions, etc. Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice versa. Extensive experience using MAVEN as a Build Tool for the building of deployable artifacts from source code.TECHNICAL SKILLS Languages: SQL, PL/SQL, Python, Java, Scala, C, HTML, Unix, Linux. ETL Tools: AWS Redshift, Alteryx, Informatica PowerCenter, Ab Initio. Big Data: HDFS, MapReduce, Spark, Airflow, Yarn, NiFi, HBase, Hive, Pig, Flume, Sqoop, Kafka, Oozie, Hadoop, Zookeeper, Spark SQL. RDBMS: Oracle 9i/10g/11g/12c, Teradata, MySQL, MS SQL. NoSQL: MongoDB, HBase, Cassandra. Cloud Platforms: Microsoft Azure, AWS (Amazon Web Services), Snowflake. Concepts and Methods: Business Intelligence, Data Warehousing, Data Modeling, Requirement Analysis. Data Modeling Tools: Erwin, Power Designer, Embarcadero ER Studio, MS Visio, ER Studio, Star Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables. Other Tools: Azure Databricks, Azure Data Explorer, Azure HDInsight. Operating Systems: UNIX, Windows, Linux.PROFESSIONAL EXPERIENCECLIENT: EDWARD JONES, ST. LOUIS, MO.ROLE: SENIOR BIG DATA ENGINEERDATE: JUNE 2022 PRESENT
DESCRIPTION :
Edward Jones is a Fortune 500 company that provides financial services through personalized service to individual investors. Headquartered in St. Louis, Missouri, I work in the company as a Senior Big Data Engineer and design and implement data pipelines for the transformation and analysis of big data volumes. Automating ETL processes, which includes real-time data streaming, contributes to data-driven decision-making and the development of insightful reports.RESPONSIBILITIES: Built and optimized data pipelines for ETL processes, leveraging AWS services including EMR, EC2, S3, RDS, Lambda, Kinesis, Glue, SQS, Redshift, ECS, and AWS Databricks for large-scale data transformations and analytics. Collaborated cross-functionally on ETL tasks, ensuring data integrity and pipeline stability through automated checks and monitoring tools. Implemented CI/CD workflows using Git, Jenkins, and Docker, establishing robust big data architecture on AWS. Actively involved in the entire SDLC, from analysis and planning to post-production performance tuning. Developed Python functions and automated ETL pipelines, migrating data from DynamoDB to Redshift using SQL and Python for seamless data transitions. Leveraged AWS EMR for large-scale ETL operations on datasets from S3, DynamoDB, and MongoDB, enabling efficient data storage and retrieval processes. Automated the loading of data from S3 to Redshift using AWS Data Pipeline, incorporating JSON schema for accurate table and column mappings. Engineered real-time data pipelines using Kafka and Spark Streaming for HBase data consumption, and developed Scala-based Spark-Kafka integrations. Created and optimized SparkSQL and PySpark applications to process and transform data, using Apache Spark on AWS EMR and Databricks. Integrated Databricks with AWS for unified analytics across data lakes and data warehouses, improving the performance of large-scale data processing and machine learning workflows. Processed near-real-time data from AWS S3 using Spark Streaming for building and maintaining learner data models, and persisting outputs in HDFS. Collaborated with cross-functional teams using Databricks, Airflow, and AWS Glue for scalable data pipeline orchestration, data integration, and automation. Created interactive, data-driven reports using Tableau, integrated with Alteryx for advanced data analytics. Managed security groups AWS, focusing on high availability, fault tolerance, and auto-scaling using Terraform templates along with CI/CD, AWS Lambda, and AWS code pipeline. Worked with the Data Analyst team on resolving the data issues for the existing data sources and published new data sources in Tableau.ENVIRONMENT: AWS (EMR, EC2, S3, RDS, Lambda, Kinesis, Glue, SQS, Redshift, ECS, Databricks), Python, SQL, DynamoDB, MongoDB, Apache Spark, Spark Streaming, PySpark, Scala, Kafka, HDFS, Tableau, Alteryx, Jenkins, Git, Docker, Terraform, Apache Airflow.CLIENT: HUMANA, CHARLOTTE, NC.ROLE: BIG DATA ENGINEERDATE: MAY 2021 MAY 2022
DESCRIPTION :
Humana is the leading healthcare and insurance company. It specializes in health and wellness for individuals and businesses. As a Big Data Engineer, my main responsibilities revolve around the design and implementation of data solutions with a focus on optimization in large-scale data processing and storage.
RESPONSIBILITIES: Designed, developed, and supported Data Lake and BI applications. Led the migration from Teradata to AWS Redshift, integrating DBT for transformation management. Set up Snowflake cloud data warehouse with DBT integration for efficient data modeling. Set up Snowflake cloud data warehouse with DBT integration for efficient data modeling and established multi-cluster architecture for optimal performance. Leveraged Snowflake s secure data-sharing capabilities to facilitate real-time collaboration with stakeholders Architected data warehouse solutions and BI platforms to enhance analytics capabilities. Developed serverless AWS Lambda functions in Python and YML for automated workflows. Managed AWS services including EC2, VPC, S3, SNS, CloudWatch, and Glue ETL to streamline data processing. Implemented CI/CD pipelines using AWS CodePipeline for continuous integration and delivery. Optimized Scala/Spark jobs by leveraging efficient data frame partitioning for improved performance. Executed DevOps practices with GIT, Jenkins, Docker, and AWS tools for streamlined deployment processes. Created PySpark APIs for complex data transformation adhering to business intelligence rules. Configured AWS Lambda functions to monitor S3 bucket metrics and perform storage analytics. Collaborated with teams using AWS CodeCommit, CodeDeploy, CodeBuild, and GIT repositories for version control and deployment. Implemented data quality checks and governance frameworks to ensure data integrity and compliance. Conducted performance tuning of SQL queries and data pipelines to optimize runtime efficiency. Engaged with cross-functional teams and stakeholders to gather requirements and deliver tailored data solutions. Established monitoring and reporting frameworks using AWS CloudWatch and QuickSight for actionable insights. Developed comprehensive documentation for data solutions and provided training sessions for team members to ensure knowledge transfer.ENVIRONMENT: Kafka, Impala, PySpark, Snowflake, Databricks, Data Lake, Apache Beam, Cloud Shell, Tableau, Cloud SQL, MySQL, Python, Scala, Spark, Spark SQL, NoSQL, MongoDB, Jira.CLIENT: AMERICAN AIRLINES, FORT WORTH, TX.ROLE: BIG DATA ENGINEERDATE: JAN 2019 MAY 2021
DESCRIPTION :
American Airlines is one of the largest airlines in the United States, serving millions of travellers per year both domestically and internationally with its extensive network across the world. As a Big Data Engineer, I am concerned with the movement and optimization of large-scale data systems to cloud environments. You will develop and enhance data processing and deploy automation workflows in support of business operations.
RESPONSIBILITIES: Transitioned legacy systems to a cloud-based (Amazon Web Services) solution and re-designed the applications to run on cloud platforms while making minimal modifications in the architecture.
Developed and implemented Cloudera Hadoop environment. Used Spark Dataframe API to perform analytics on hive data over the Cloudera Platform
Created ARM templates while building Azure backend and frontend services needed for applications on Microsoft Azure Cloud and used Terraform in conjunction to also build AWS services.
Developed AWS Cloud Formation templates to create a custom infrastructure for the re-designed architecture and automated the deployment of the data pipelines through these templates.
Designed and Integrated AWS DynamoDB and HBase using AWS Lambda to store the values of items and backup the DynamoDB streams.
Ingested data to one Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processed the data in Azure Databricks and estimated the cluster size.
Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, and Validation and verified its performance over MR jobs and utilized it to monitor and manage the Hadoop Cluster. Monitored, and performed troubleshooting of Spark Databricks cluster along with expertise on Azure Databricks
Created a data set process for data modeling and recommended ways to improve data quality, efficiency, and reliability. Worked on various compression and file formats like Avro, Parquet, and Text formats and migrated MapReduce jobs to Spark jobs to achieve a better performance.
Processed location and segment data from S3 to Snowflake by using tasks, streams, pipes, and stored procedures. Performed data quality analysis using Snow SQL by building warehouses on Snowflake.
Employed security practices in AWS including multi-factor authentication, access key rotation, encryption using AWS Key Management, S3 bucket policies, and firewall security groups.
Designed and developed CICD pipelines in Jenkins using Groovy scripts, and Jenkins files, integrating a variety of Enterprise tools and testing frameworks into Jenkins to develop completely automated pipelines to pull code from development workspaces to the Production environment.ENVIRONMENT: Amazon Web Services (AWS), Cloudera Hadoop, Spark, Hive, Microsoft Azure, ARM Templates, Terraform, AWS CloudFormation, AWS Lambda, AWS DynamoDB, HBase, Azure Data Lake, Azure Storage, Azure SQL, Azure Data Warehouse, Azure Databricks, Azure HDInsight, Jenkins, Snowflake, Snow SQL, Avro, Parquet, Groovy, S3, MapReduce, Python, Multi-Factor Authentication, AWS Key Management, S3 Bucket Policies, Firewall Security Groups.CLIENT: HYUNDAI MOTORS, FOUNTAIN VALLEY, CA.
ROLE: DATA ENGINEERDATE: NOV 2015 DEC 2018
DESCRIPTION :
Hyundai Motors is one of the most renowned automobile manufacturers in the world manufacturing electric, and hybrid automobiles, among others, that integrate innovative means of greening and sustainable efforts. As a Data Engineer, I will be responsible for designing and integrating cloud-based data solutions that drive business decisions by eliminating redundant data processing steps. The data processes will be streamlined for optimal performance and will be seamlessly integrated from one platform to another for the efficient handling and analysis of data.
RESPONSIBILITIES: Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Transforming business problems into Big Data solutions and defining Big Data strategy and Roadmap. Installing, configuring, and maintaining Data Pipelines Designed and developed a comprehensive data services ecosystem utilizing Relational, NoSQL, and Big Data technologies. Transformed business challenges into scalable Big Data solutions, defining strategies and roadmaps for data initiatives. Installed, configured, and maintained data pipelines to ensure smooth data flow across platforms. Developed Azure PowerShell scripts for efficient data transfer from local file systems to HDFS Blob storage. Created and optimized ETL pipelines in Azure Data Factory (ADF) to extract, transform, and load data from sources including Azure SQL and Blob storage. Built and managed a Data Warehouse on the Azure platform, leveraging Azure Databricks and Azure Data Factory for streamlined data integration. Experienced in managing Azure Data Lakes (ADLS) and integrating with various Azure services, applying USQL for data processing. Implemented data processing pipelines in Azure Synapse Analytics, integrating with Azure Databricks and DBT for scalable data workflows. Troubleshoot and optimized Azure development configurations and performance issues, enhancing system efficiency. Managed and scheduled job executions using Autosys, ensuring timely data processing and system maintenance. Utilized Spark Streaming to process real-time data from Kafka, storing the data in HDFS and NoSQL databases (HBase, Cassandra). Optimized DynamoDB for high-performance data storage and retrieval, ensuring seamless integration with real-time applications. Leveraged Apache Spark (DataFrames, Spark SQL, MLlib) to develop and optimize data processing solutions, improving user engagement metrics significantly. Implemented robust error handling and data validation strategies in Ab Initio ETL processes, ensuring data integrity. Used SQL Server Integration Services (SSIS) for ETL tasks, consolidating data from multiple sources efficiently. Conducted unit testing using NUnit, providing critical feedback to developers for continuous improvement.ENVIRONMENT: Hadoop, Azure, Kafka, Spark, Sqoop, Docker, Swamp, Spark SQL, Azure Synapse, Databricks, DBT, TDD, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, HBase, Data Lake, Zookeeper.CLIENT: DEUTSCHE BANK, BANGALORE, INDIA.ROLE: DATA ENGINEER
DATE: OCT 2014 OCT 2015
DESCRIPTION :
The strong presence in financial services has positioned Deutsche Bank as one of the leading investment banks globally, serving a wide array of solutions to all its clients across the globe. Much emphasis is laid on delivering top-notch financial services that address the needs of all clients in various parts of the globe at Deutsche Bank. The company focuses more on innovative solutions that drive operational efficiencies and meet the high level of complexities in business operations. Excellence is a relative factor at Deutsche Bank; the bank ensures the environment at work is dynamic, one that thrives on collaboration and expertise.RESPONSIBILITIES: Involved in the review of functional and non-functional requirements. Installed and configured Hadoop MapReduce and HDFS; developed multiple MapReduce jobs. Created consumption views on top of metrics to reduce the running time for complex queries. Registered business and technical datasets to corresponding SQL scripts using Nebula Metadata. Developed Spark code and Spark SQL/Streaming for faster testing and processing of data. Installed and configured Pig, and authored Pig Latin scripts. Design and implement a large-scale parallel relation-learning system based on MapReduce.
Set up and benchmark internal-used Hadoop/HBase clusters.
Designed metric tables and end-user views in Snowflake, feeding data into Tableau refresh.
Wrote MapReduce jobs using Pig Latin. Involved in ETL, data integration, and migration.
Regularly imported/exported data between HDFS and Oracle Database using Sqoop. Writing Hive queries for data analysis, according to the business needs.
Worked with the Spark ecosystem by leveraging Spark SQL and Scala on top of different formats.ENVIRONMENT: Hadoop, MapReduce, HDFS, Hive, Java, Cloudera Hadoop distribution, Pig, HBase, Linux, XML, Tableau, Eclipse, Oracle 10g, Toad.

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: