Data Engineer Resume Chicago, IL

Data Engineer Resume Chicago, IL
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-IL-Chicago
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Data Engineer Power Bi Chicago, IL

Data Engineer South Elgin, IL

Data Engineer Business Intelligence Naperville, IL

Data Engineer Azure Aurora, IL

Data Engineer Power Bi Chicago, IL

Data Engineer Chicago, IL

Click here or scroll down to respond to this candidate

Candidate's Name
DATA ENGINEEREMAIL AVAILABLEPHONE NUMBER AVAILABLELinkedInPROFESSIONAL SUMMARY: 10+ years of IT experience as a Data Engineer in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Python technologies. In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, and Zookeeper. Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib and Pandas for organizing data. Experience in different Hadoop distributions like Cloudera and Horton Works Data Platform (HDP). In depth understanding of Hadoop Architecture including YARN and various components such as HDFS Resource Manager, Node Manager, Name Node, Data Node. Hands on experience in Importing and exporting data from RDBMS into HDFS and vice-versa using Sqoop. Experience in working with Hive data warehouse tool-creating tables, distributing data by static partitioning and dynamic partitioning, bucketing, and using Hive optimization techniques. Experience working with Cassandra and NoSQL database including MongoDB and HBase. Experience in tuning and debugging Spark application and using Spark optimization techniques. Experience in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing. Good Knowledge on MAPR distribution & Amazon's EMR. Have been working with AWS cloud services (VPC, EC2, S3, Redshift, Data Pipeline, EMR, DynamoDB, Lambda and SQS). Experience in Snowflake Lambda data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka. Good knowledge on spark components like Spark SQL, MLlib, Spark Streaming and GraphX. Extensively worked on Spark streaming and Apache Kafka to fetch live stream data. Hands on experience in creating real time data streaming solutions using Apache Spark Core, Spark SQL, and Data Frames. Hands on experience on Google Cloud Platform (GCP) in all the bigdata products BigQuery, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service). Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP. Extensive knowledge in implementing, configuring, and maintaining Amazon Web Services (AWS) like EC2, S3, Redshift, Glue and Athena. Hands on experience in designing and developing applications In Spark using Scala and PySpark to compare the performance of Spark with Hive and SQL/Oracle. Experience in working with Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer). Experienced in data manipulation using python and python libraries such as Pandas, NumPy, SciPy and Scikit-Learn for data analysis, numerical computations, and machine learning. Experience in writing queries using SQL, experience in data integration and performance training. Developed various shell scripts and python scripts to automate Spark jobs and Hive scripts. Hands on Experience in using Visualization tools like Tableau, Power BI. Experience in working with GIT, Bitbucket Version Control System. Experience in working with COSMOS. Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team. Extensive experience working in a Test-Driven Development and Agile-Scrum Development. Working knowledge of data mining principles: predictive analytics, mapping, collecting data from multiple data systems on premises and cloud-based data sources. Experience working in areas such as data warehousing, databases, data visualization, statistics, data analysis, A/B Experiments, reporting, basic machine learning and modeling. Very keen in knowing the newer techno stack that Google Cloud platform (GCP) adds. Experience in using stackdriver service/ dataproc clusters in GCP for accessing logs for debugging.TECHNICAL SKILLS:Big Data TechnologiesApache Spark, HDFS, Map Reduce, HIVE, Sqoop, Oozie, Zookeeper, PyCharmProgramming LanguagesPython, Scala, SQL, PL/SQL, Linux Shell ScriptsNoSQL DatabaseHBase, Cassandra, Mongo DB, Dynamo DBDatabaseOracle 11g/10g, MY SQL, MS-SQL Server, DB2, TeradataReporting ToolsPower BI, Data Studio, TableauPythonPandas, Numpy, SciPy, Matplotlib.Operating SystemsLinux, Unix, WindowsCloudAWS, Azure, GCPDistributed PlatformsCloudera, Horton Works, MapRGCPGCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub.AWSEC2, IAM, S3, Autoscaling, Cloud Watch, Route53, EMR, DynamoDBAzure
MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake, Data FactoryEDUCATION DETAILS:PROFESSIONAL EXPERIENCE:Client : Abbott,Maine October 2021-PresentRole: Azure Data EngineerResponsibilities:
Extensively worked with Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer). Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, x, Spark SQL and U-SQL Azure Data Lake Analytics. Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Created Pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, Pipeline to extract, transform and load data from different sources like Azure SQL, Blob storage, Azure SQL DW, write- back tool and backwards. Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share. Designed and configured Azure Cloud relational servers and databases, analyzing current and future business requirements. Building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation. Worked on Power Shell scripts to automate the Azure cloud system creation of Resource groups, Web Applications, Azure Storage Blobs & Tables, firewall rules. Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB). Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes. Designed and deployed data pipelines using Data Lake, Databricks, and Apache Airflow. Developed Elastic pool databases and scheduled Elastic jobs to execute T-SQL procedures. Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks. Created and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters. Created several Databricks Spark jobs with PySpark to perform several tables to table operations. Created data pipeline for different events in Azure Blob storage into Hive external tables. Used various Hive optimization techniques like partitioning, bucketing and Map join. Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API). Designed custom-built input adapters using Spark, Hive, and Sqoop to ingest and analyze data (Snowflake, MS SQL, MongoDB) into HDFS. Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally. Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.Environment: Spark, Azure, Spark-Streaming, Spark SQL, MapR, HDFS, Hive, PIG, Apache Kafka, Sqoop, Python, Scala, PySpark, Shell scripting, Linux, MySQL, NoSQL, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, Power BI, SOAP, Cassandra, and Agile Methodologies.Client : Comcast, New jersey August 2020-September 2021
Role: AWS Data Engineer
Responsibilities:
Developed Spark Applications by using Python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN. Performed tuning of Spark Applications to set batch interval time and correct level of Parallelism and memory tuning. Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in real time and persist it to Cassandra. Scheduling Spark/Scala jobs using Oozie workflow in Hadoop Cluster and generated detailed design documentation for the source-to-target transformations. Developed Kafka consumer s API in python for consuming data from Kafka topics. Used Kafka to consume XML messages and Spark Streaming to process the XML file to capture UI updates. Valuable experience on practical implementation of cloud-specific technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, Glue, EMR. Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage. Loaded data into S3 buckets using AWS Lambda Functions, AWS Glue and PySpark and filtered data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables. Maintained and operated Hadoop cluster on AWS EMR. Used AWS EMR Spark cluster and Cloud Dataflow to compare the efficiency of a POC on a developed pipeline. Configured Snow pipe to pull the data from S3 buckets into Snowflakes table and stored incoming data in the Snowflakes staging area. Created live real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system. Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse. Designed columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement. Designed, developed, deployed, and maintained MongoDB. Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming. Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems (RDBMS) and vice-versa. Written several Map reduce Jobs using PySpark, Numpy and used Jenkins for Continuous integration. Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios. Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries. Worked on cloud deployments using Maven, Docker, and Jenkins. Worked on Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using PIG etc. Generated various kinds of reports using Power BI and Tableau based on Client specification.Environment: AWS EMR, S3, EC2, Lambda, MapR, Apache Spark, Spark-Streaming, Spark SQL, HDFS, Hive, PIG, Apache Kafka, Sqoop, Flume, Python, Scala, Shell scripting, Linux, MySQL, HBase, NoSQL, DynamoDB, Cassandra, Machine Learning, Snowflake, Maven, Docker, AWS Glue, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, Power BI.Client : Progressive Insurance, Atlanta February 2019-July 2020Role: AWS Data EngineerResponsibilities: Analyzing large amounts of datasets to determine optimal way to aggregate and report on these datasets. Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN. Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra. Used Kafka for live streaming data and performed analytics on it. Worked on Sqoop to transfer the data from relational database and Hadoop. Loaded data from Web servers and Teradata using Sqoop, Flume and Spark Streaming API. Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
Implemented a Python-based distributed random forest via Python streaming. Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation. Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc. Created AWS Data pipelines using various resources in AWS including AWS API Gateway to receive response from AWS Lambda and retrieve data from Snowflake using Lambda function and convert the response into Json format using database as Snowflake, DynamoDB, AWS Lambda function and AWS S3. Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR. Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's. Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats. Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for data analysis. Developed Python code for different tasks, dependencies, and time sensor for each job for workflow management and automation using Airflow tool. Worked on cloud deployments using Maven, Docker and Jenkins. Create Glue jobs to process the data from S3 stating area to S3 persistence area. Scheduling Spark/Scala jobs using Oozie workflow in Hadoop Cluster and generated detailed design documentation for the source-to-target transformations. Proficient in utilizing data for interactive Power BI dashboards and reporting purposes based on business requirements.Environment: AWS, MySQL, Snowflake, MongoDB, Cassandra, Teradata, Flume, Tableau, Power BI, Git, Blob Storage, Data Factory, Data Storage Explorer, Scala, Hadoop (HDFS, MapReduce, Yarn), Spark v2.0.2, PySpark, Airflow, Hive, Sqoop, HBase, Oozie.Client : JPMorgan-Chase, NewYork August 2017-February 2019Role: GCP Data EngineerResponsibilities: Maintaining the infrastructure in multiple projects across the organization in Google cloud platform using terraform (Infrastructure as code). Making existing BigQuery with Tableau for reporting more performant using various techniques like partitioning the right column and testing the solutions using different scenarios. Developed ELT processes from the files from abinitio, google sheets in GCP with compute being dataprep, dataproc (pyspark) and Bigquery. Migrated an Oracle SQL ETL to run on google cloud platform using cloud dataproc & bigquery, cloud pub/sub for triggering the airflow jobs. Worked on using presto, hive, spark-sql, bigquery using python client libraries and building interoperable and faster programs for analytics platforms. Used apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators. Developed new techniques for orchestrating the Airflow built pipelines and used airflow environment variables for defining project level and encrypting the passwords. Working knowledge in working around Kubernetes in GCP, working on creating new monitoring techniques using the stackdriver s log router and designing reports in data studio. Served as an integrator between data architects, data scientists, and other data consumers. Converted SAS code to python/spark-based jobs in cloud dataproc/big query in GCP. Used cloud pub/sub and cloud functions for some specific use cases such as triggering workflows upon messages. Development of data pipelines with cloud composer for orchestrating, cloud dataflow for building scalable machine learning algorithms for clustering, cloud data prep for exploration. Migrated previously written cloud prep jobs to Bigquery. Worked closely with security teams by providing them the logs with respect to firewalls, VPC s and setting up rules in GCP for vulnerability. Created custom roles for sandbox environments using terraform to avoid vulnerabilities.Environment: GCP, Python, BigQuery, PySpark, Tableau, ETL, Oracle, SQL, Hive, Apache Airflow, Dataprep, Dataproc, Stackdriver, Bash, Hadoop, Kubenetes, SASClient : Ingenious minds lab, Ahmedabad May 2013-June 2015Role: Data EngineerResponsibilities: Gathering data and business requirements from end users and management.
Designed and built data solutions to migrate existing source data in Data Warehouse to Big Data. Analyzed huge volumes of data Devised simple and complex HIVE, SQL scripts to validate Dataflow in various applications.
Performed Cognos report validation. Made use of MHUB for validating Data Profiling & Data Lineage. Performing hive tuning techniques like partitioning and bucketing and memory optimization.
Partnered with ETL developers to ensure that data is well cleaned, and the data warehouse is up to date for reporting purpose by Pig.
Supported data quality management by implementing proper data quality checks in data pipelines.
Build machine learning models to showcase big data capabilities using Pyspark and MLlib.
Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
Implemented data streaming capability using Kafka and Talend for multiple data sources.
Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala).
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Used spark SQL to load data and created schema RDD on top of that which loads into hive tables and handled structured using spark SQL.
Implemented a CI/ CD pipeline with Docker, Jenkins, and GitHub by virtualizing the servers using Docker for the Dev and Test environments by achieving needs through configuring automation using Containerization.
Involved in the development of agile, iterative, and proven data modeling patterns that provide flexibility.
Performed data validation which does the record wise counts between the source and destination.
Involved in the data support team as role of bug fixes, schedule change, memory tuning, schema changes loading the historic data.
Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Worked on analyzing and resolving the production job failures in several scenarios.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark). Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance. Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines Worked on confluence and Jira skilled in data visualization like Matplotlib and seaborn library.Environment: Hive, Hadoop, HDFS, Python, PL/SQL, SQL, R Programming, Apache Airflow, Numpy, Pandas, Jira, PIG, Tableau, Spark, Linux.

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: