Azure Data Engineer Resume Dayton, OH

Azure Data Engineer Resume Dayton, OH
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Azure data engineer
Target Location	US-OH-Dayton
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Data Engineer Azure Cincinnati, OH

Data Engineer Intern Cincinnati, OH

Deep Learning Data Engineer Cincinnati, OH

Data Engineering Engineer Cincinnati, OH

Data Engineer Engineering West Chester, OH

Data engineer Cincinnati, OH

Click here or scroll down to respond to this candidate

ABOUT ME5+ years of IT Experience in Architecture, Analysis, design, development, implementation, maintenance, and support, with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirements.PROFILE SUMMARY Results-driven Data Engineer with 5+ years of experience in designing and implementing robust data solutions to drive business insights and enhance data-driven decision-making. Have Extensive experience in IT data analytics projects, Hands-on experience in migrating on premise ETs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer. Experience in Evaluation/design/development/deployment of additional technologies and automation for managed services on S3, Lambda, Athena, EMR, Kinesis, SQS, SNS, CloudWatch, Data Pipeline, Redshift, Dynamo DB, AWS Glue, Aurora DB, RDS, EC2. Practical knowledge in setting up and designing large-scale data lakes, pipelines, and effective ETL (extract/load/transform) procedures to collect, organize, and standardize data that can be used to Converted a current on-premises application to use Azure cloud databases and storage. Hands-on experience with Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory. Experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro, and ZooKeeper. Used Spark streaming, HBase, and Kafka to work on real-time data integration. Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Managing Clusters in Databricks, Managing the Machine Learning Lifecycle. Highly skilled in using visualization tools like Tableau, matplotlib, ggplot2 for creating dashboards. Practical knowledge in using Sqoop to import and export data from Relational Database Systems to HDFS, and vice versa Experience in automating day-to-day activities by using Windows PowerShell. Develop DDL and DML statements for data modelling, data storage and looked into performance fine tuning. Experience developing sophisticated SQL queries, procedures, and triggers using RDBMS like Oracle and MySQL. Proficient in utilizing Kubernetes and Docker for designing and implementing data pipelines. Hands-on experience interacting with REST APIs developed using the micro-services architecture for retrieving data from different sources. Proficient in building CI/CD pipelines in Jenkins using pipeline syntax and groovy libraries. Worked with Matillion which Leverage Snowflakes separate compute and storage resources for rapid transformation and get the Get the most from Snowflake-specific features, such as Alter Warehouse and Flatten Variant, Object, and Array. Used Jenkins as a Continuous Integration / Continuous Deployment Tool. Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks, Managing the Machine Learning Lifecycle. Implemented production scheduling jobs using Control-M, and Airflow.EDUCATION Masters from Saint Peter's University, USAWORK EXPERIENCEClient: Homebridge Financial Services, Iselin, New Jersey, USA Sep 2023 - PresentRole: Azure Data EngineerDescription: Homebridge Financial Services, Inc. is a privately held, non-bank loan company. Developed and optimized multi-threaded ingestion jobs and Sqoop scripts to import data from FTP servers and data warehouses.Responsibilities: Worked hands on with ETL processes. Handled imported data from various data sources, performed transformations. Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses. Enabling monitoring and azure log analytics to alert support team on usage and stats of the daily runs. Developed SQOOP scripts to migrate data from Oracle to Big data Environment. Experience in using Kafka as a messaging system to implement real-time Streaming solutions using Spark Streaming Created Databricks Job workflows which extracts data from SQL server and upload the files to SFTP using PySpark and python. Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD. Involved in creating Hive tables, loading, and analyzing data using hive scripts. Utilized C# and ADO.NET to establish connections to databases such as SQL Server, Oracle, and MySQL, enabling efficient data retrieval and manipulation. Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity and creating UNIX shell scripts for database connectivity and executing queries in parallel job execution. Built and configured Jenkins slaves for parallel job execution. Installed and configured Jenkins for continuous integration and performed continuous deployments. Used Docker for managing the application environments. Ensured data integrity and consistency during migration, resolving compatibility issues with T-SQL scripting. Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions. Worked on CI/CD tools like Jenkins, Docker in Devops Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between. Implemented Synapse Integration with Azure Databricks notebooks which reduce about half of development work and achieved performance improvement on Synapse loading by implementing a dynamic partition switch. Built PySpark code in validate the data from raw source to Snowflake tables.Environment: Azure, Data Factory, Docker, EC2, Elasticsearch, ETL, Factory, Hive, Java, Jenkins, JS, Kafka, lake, Lambda, MySQL, Oracle, PySpark, Python, Scala, Snowflake, Spark, Spark SQL, Spark Streaming, SQL, SqoopClient: Catalent, Somerset, New Jersey, USA Nov 2022 - Aug 2023Role: AWS Data EngineerDescription: Catalent Inc. is the global leader in enabling pharma, biotech, and consumer health partners to optimize product development, launch, and full life-cycle supply for patients. I developed and maintained data backup and disaster recovery plans to ensure data availability in case of system failures or other emergencies.Responsibilities: Integrated AWS DynamoDB using AWS Lambda to store the values of items and backup the DynamoDB streams. Strong knowledge of ETL best practices and experience designing and implementing ETL workflows using Talend. Stored and processed data by using low level Java APIs to ingest data directly to HBase. Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts. Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds. Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL and python scripting. Used Python based GUI components for the Front-End functionality such as selection criteria. Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks. Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream and Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing. Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime. Applied query optimization principles to dynamic SQL, utilizing indexes and query plan analysis for efficient execution. Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment. Worked on container orchestration tools such as Docker swarm, Mesos, and Kubernetes. Successfully managed data migration projects, including importing and exporting data to and from MongoDB, ensuring data integrity and consistency throughout the process. Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack- Elastic search and Kibana. Set up the CI/CD pipelines using Jenkins, Maven, GitHub, Chef, Terraform, and AWS. Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level.Environment: Apache, API, APIs, AWS, CI/CD, Docker, DynamoDB, ETL, Git, HBase, HDFS, Hive, Java, Jenkins, Jira, Kafka, Kubernetes, Lambda, MapR, PIG, Python, Scala, Spark, Spark Streaming, SQLClient: Deutsche Bank, Mumbai, India Nov 2020 - Jul 2022Role: Application Developer/ Data EngineerDescription: Deutsche Bank, a stalwart of German finance, offers a comprehensive suite of global banking and financial services, underpinned by a rich history and a commitment to innovation. Managed and optimized data workflows, ensuring seamless integration and transformation from diverse sources to enhance business intelligence.Responsibilities: Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API). Create custom logging framework for ELT pipeline logging using Append variables in Data factory. Managed, Configured and scheduled resources across the cluster using Azure Kubernetes Service. Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash. Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase. Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Performed data wrangling to clean, transform and reshape the data utilizing panda s library. Worked on Jenkins pipelines to run various steps including unit, integration and static analysis tools. Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB. Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Written queries in MySQL and Native SQL. Involved in monitoring and scheduling the pipelines using Triggers in Azure DataFactory. Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects. Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy. Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible. Imported data from various sources into Spark RDD for processing. Optimized Hive analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing. Developed database triggers and stored procedures using T-SQL cursors and tables.Environment: API, Azure, Cosmos DB, Data Factory, ELT, Factory, Git, HBase, HDInsight, Hive, Jenkins, Kafka, Kubernetes, MySQL, Python, Spark, Spark SQL, SQLClient: Jio, Mumbai, India Mar 2019 - Oct 2020Role: Data EngineerDescription: Jio is a pioneering telecommunications company in India, renowned for its widespread network and transformative digital services, reshaping the telecom landscape with groundbreaking innovations. Utilized advanced SQL querying skills to extract, transform, and load data from multiple relational databases.Responsibilities: Used Azure Data factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day requirements, and loaded them to Azure Data Lakes. Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects. Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes. Utilized Azure Logic Apps to build workflows to schedule and automate batch jobs by integrating apps, ADF pipelines, and other services like HTTP requests, email triggers etc. Responsible for implementing monitoring solutions in Ansible, Terraform, Docker, and Jenkins. Develop Spark streaming application tread raw packet data from Kafka topics, format it to JSON and push back to Kafka for future use cases purpose. Created several Databricks Spark jobs with PySpark to perform several tables to table operations. Developed a Front-End GUI as stand-alone Python application. Developed multiple notebooks using Pyspark and Spark SQL in Databricks for data extraction, analyzing and transforming the data according to the business requirements. Created and maintained technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts. Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment. Performed data analysis and data profiling using complex SQL queries on various source systems including Oracle 10g/11g and SQL Server 2012.Environment: Airflow, Azure, Azure Data Lake, Cluster, Docker, ETL, Hive, Jenkins, JS, Kafka, lake, Lake, Oracle, Pig, PySpark, Python, RDBMS, Snowflake, Spark, Spark SQL, SQLTECHNICAL SKILLS AWS Services: S3, EC2, EMR, Redshift, RDS, Lambda, Kinesis, SNS, SQS, AMI, IAM, Cloud formation Hadoop Components / Big Data: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, PySpark Airflow, Kafka, Snowflake Spark Components Databases: Oracle, Microsoft SQL Server, MySQL, DB2, Teradata Programming Languages: Java, Scala, Impala, Python. Web Servers: Apache Tomcat, WebLogic. IDE: Eclipse, Dreamweaver NoSQL Databases: NoSQL Database (HBase, Cassandra, Mongo DB) Methodologies: Agile (Scrum), Waterfall, UML, Design Patterns, SDLC. Currently Exploring: Apache Flink, Drill, Tachyon. Cloud Services: AWS, Azure, Azure Data Factory / ETL/ELT/SSIS Azure Data Lake Storage Azure Data bricks ETL Tools: Talend Open Studio & Talend Enterprise Platform Reporting and ETL Tools: Tableau, Power BI, AWS GLUE, SSIS, SSRS, Informatica, Data Stage

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: