Data Engineer Azure Resume Fort wayne, I...

Data Engineer Azure Resume Fort wayne, I...
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Engineer Azure
Target Location	US-IN-Fort Wayne
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Click here or scroll down to respond to this candidate
Candidate's Name
Sr. Azure Data EngineerEMAIL AVAILABLEPHONE NUMBER AVAILABLEProfessional Summary:Experienced IT professional with 10+ years of expertise, focusing on Azure Data Engineering for 6 years and specialized in data warehouse development for 4 years. Adept at designing and implementing cutting-edge solutions for data engineering and data warehouse environments.Demonstrated proficiency in designing and implementing end-to-end data solutions, leveraging Azure Cloud services such as Azure Data Factory, Databricks, Synapse Analytics, and Blob Storage.Expertise in orchestrating ETL operations at scale, optimizing data workflows for efficiency and reliability while ensuring seamless integration across diverse data sources.Extensive experience in real-time data processing using technologies like Azure Stream Analytics and Apache Kafka, enabling timely insights and decision-making.Implemented Apache Spark for distributed data processing, harnessing its parallel computing capabilities to handle large-scale analytics and processing tasks effectively.Leveraged Power BI for intuitive data visualization and interactive business intelligence reporting, facilitating informed decision-making within organizations.Designed and optimized SQL queries for efficient database operations, ensuring reliable data retrieval, manipulation, and management.Implemented Snowflake data warehousing solutions tailored to specific needs, ensuring efficient data storage, management, and analytics capabilities.Led the development of data pipelines using technologies like Apache Spark, Apache Airflow, and Azure Data Factory, ensuring reliable and timely data processing and delivery.Proficient in data integration and optimization of DBT (Data Build Tool) within data engineering workflows, enabling streamlined and maintainable data transformation processes while ensuring comprehensive documentation and lineage tracking for enhanced data governance and reliability.Enhanced Spark performance by optimizing data processing algorithms and leveraging techniques such as partitioning, caching, and broadcast variables.Utilized Linux Shell scripting for Azure infrastructure management, ensuring robust and scalable operations.Proficient in leveraging PostgreSQL for efficient data storage, management, and querying, ensuring optimal performance in data pipelines and analytical workflows.Skilled in implementing event-driven architectures using Azure Event Hubs, enabling real-time ingestion, processing, and analysis of streaming data at scale.Actively participated in Agile ceremonies and collaborated with DevOps engineers, utilizing JIRA and Azure DevOps for effective project management.Developed and optimized PySpark-based data processing pipelines within Azure Databricks, ensuring scalability and performance of ETL processes.Proficient in developing and optimizing data processing pipelines using Python and Scala, leveraging frameworks like PySpark within Azure Databricks for scalable and high-performance ETL operations.Implemented automation using Azure Logic Apps, with integration into Azure Key Vault for enhanced security measures and data governance.Developed end-to-end data pipelines using Azure Data Factory services, ensuring efficient loading of data from various sources to Azure SQL Server.Built scalable and reliable ETL pipelines to pull large and complex data from different systems efficiently, ensuring data integrity and consistency.Utilized Agile methodologies to iteratively plan, develop, and deliver data engineering solutions, ensuring continuous collaboration and adaptability to evolving requirements.Proficient in utilizing Git and GitHub repositories for maintaining source code and enabling version control, ensuring codebase integrity and collaboration.Designed and implemented efficient data ingestion pipelines to collect structured and unstructured data from various sources, ensuring scalability and reliability.Stayed up to date with the latest big data technologies and best practices through research, training, and professional development, contributing to ongoing innovation and improvement.Technical Skills:Data Integration & ETLSQL, Azure Data Factory, Azure Databricks, Snowflake, Apache Kafka, Apache Flume, Apache Spark, Apache Airflow, ELT/ETL Pipelines with Python, Logic Apps, Azure Functions, CI/CD Frameworks (Jenkins), Azure Synapse Analytics.Data Storage & RetrievalAzure Storage, Azure SQL DB, Blob Storage, MySQL, Cassandra, Azure Data Lake Storage (adls), Gen2, Azure Data Bricks, Data Lakes, HDFSBig Data TechnologiesHadoop, Spark Core, Spark SQL, Hive, HBase, Sqoop, Pig, Apache Kafka, Zookeeper, Flume, MapReduce, Data frames, Spark Streaming, Python, Hadoop Distribution (Cloudera, Horton Works), Data build tool (DBT), PySpark, Scala.Programming LanguagesPython, SQL, PL/SQL, Scala.Web TechnologiesHTML, CSS, JavaScript, XML, JSP, Restful, SOAPBuild Automation toolsAnt, MavenVersion ControlGIT, GitHub.DatabasesAzure SQL DB, Azure Synapse, MS Excel, MS Access, Oracle 11g/12c, Cosmos DBData Analysis & ReportingPower BI, Tableau, Excel Sheets, Flat Files, CSV Files, Text files, SQL Reporting Services.Collaboration &ProjectManagementAgile Methodology, CI/CD Pipelines, DevOps Collaboration.Professional Experience:Client: AMERICAN EXPRESS, Chicago, USA. Sep 2022 to PresentRole: Azure Data Engineer with SnowflakeResponsibilities: Developed comprehensive data infrastructure on Azure, integrating ETL pipelines, real-time processing, and containerization for efficient, scalable data management. Leveraged Python, Power BI, Snowflake, and Apache Kafka to streamline workflows, enhance analytics, and automate processes, ensuring agility and reliability in decision-making.Managed end-to-end ETL operations using Azure Data Factory, showcasing proficiency in orchestrating the extraction, transformation, and loading of data at a large scale for fraud detection and prevention.Leveraged Azure Synapse Analytics for large-scale data processing, demonstrating competence in handling complex analytical workloads essential for fraud detection and prevention.Implemented Apache Spark for distributed data processing, harnessing its parallel computing capabilities to efficiently handle large-scale data analytics and processing tasks.Applied Python programming for data analysis, machine learning, and automation, showcasing versatility in developing robust solutions for diverse tasks within the realms of data science and software development.Leveraged Power BI for data visualization and interactive business intelligence reporting, providing actionable insights and facilitating informed decision-making within the organization.Developed and optimized SQL queries for efficient database operations, ensuring reliable data retrieval, manipulation, and management in support of business-critical applications.Implemented Tibco EMS for high-performance messaging and event-driven architectures, facilitating reliable and efficient real-time data ingestion, processing, and integration across enterprise systems, enhancing responsiveness and scalability in data engineering solutions.Implemented Slowly Changing Dimensions (SCD) techniques in data warehousing, enabling the effective tracking and management of historical changes in data, crucial for maintaining accurate and meaningful analytics over time.Implemented Azure Blob Storage, optimizing data organization and storage efficiency for critical fraud detection and prevention information.Designed and implemented a data warehouse architecture using Snowflake, leveraging its cloud-native platform to achieve scalable, elastic, and high-performance analytics with a focus on simplicity and ease of maintenance.Organized real-time data streaming and event-driven architectures by implementing Apache Kafka, ensuring reliable, scalable, and fault-tolerant data pipelines for seamless communication and processing across distributed systems.Established and automated complex workflows by deploying Apache Airflow, enhancing data pipeline orchestration, scheduling, and monitoring for improved efficiency and reliability in data processing tasks.Implemented a data build tool to automate the end-to-end process of extracting, transforming, and loading (ETL) data, streamlining data workflows and ensuring consistent, reliable data outputs for analytics and reporting.Developed Data Definition Language (DDL) scripts for Stage and Operational Data Store (ODS) tables in Azure SQL Database, demonstrating expertise in database design and meticulous management for efficient data storage and retrieval.Led the development of DBT models, macros, and data marts in Snowflake showcasing proficiency in crafting efficient and structured data models to support advanced analytics and reporting needs.Implemented Snowflake's Time Travel feature to enable historical data analysis and auditing, ensuring data integrity and regulatory compliance while providing insights into past data states for informed decision-making.Spearheaded the containerization of data processing apps using Azure Kubernetes, emphasizing commitment to scalable and portable solutions crucial for fraud detection and prevention systems.Innovated by crafting custom activities using Azure Functions and PowerShell scripts, streamlining processes, and significantly enhancing operational efficiency through automation and tailored solutions.Utilized real-time data processing in Azure Stream Analytics, highlighting agility in managing and analyzing data streams for timely insights and decision-making in fraud detection and prevention.Implemented automation using Azure Logic Apps, with integration into Azure Key Vault for enhanced security measures and data governance.Utilized scripting languages like Python and Scala for hands-on programming in various project tasks.Spearheaded the implementation of CI/CD pipelines and frameworks with Azure DevOps, ensuring a streamlined and automated development lifecycle.Documented technical specifications, data flow diagrams, and process documentation for Azure Architecture, ensuring clear and comprehensive documentation for knowledge transfer and future reference.Proficient in collaborating with cross-functional teams to meet project deadlines. Skilled in facilitating clear and concise communication among team members from various departments to ensure alignment and progress towards project goals.Environment: Azure Blob Storage, Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Kubernetes, Azure DevOps, Azure SQL Database, Python, Azure Logic Apps, Azure Key Vault Gen2, Snowflake Schema, Data build tool (DBT), Apache Spark, Scala, SQL Server, Power BI.Client: Value Labs Inc., Princeton, NJ. Jan 2021 to Aug 2022Role: Azure Data EngineerResponsibilities: Implemented comprehensive data solutions on Azure, optimizing ingestion, ETL processes, and real-time analytics. Leveraged a variety of tools including Apache Kafka, Azure Data Factory, Snowflake, and Spark, while collaborating cross-functionally and streamlining deployments through CI/CD pipelines.Enhanced data fetching efficiency through the implementation of optimized query techniques and indexing strategies, ensuring scalability and smooth functioning of ETL data pipelines.Data Ingestion to various Azure Services like Azure Data Lake, Azure Storage, Azure SQL, Azure DW, and processing the data in Azure Databricks.Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory, and Logic Apps and did Proof of Concept (POC) on Azure Data Bricks.Expertise in leveraging Azure Synapse for efficient data transfer, enabling seamless integration between disparate data sources and data lakes, facilitating streamlined ETL processes and real-time analytics for Azure-based data engineering solutions.Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.Developed enterprise-level solution using batch processing and streaming framework using Spark Streaming, and Apache Kafka.Developed and optimized PySpark-based data processing pipelines within Azure Databricks, utilizing distributed computing capabilities to efficiently handle large-scale data transformations and analytics tasks, ensuring scalability and performance of ETL processes.Implemented data storage and retrieval solutions using Azure Cosmos DB, ensuring highly available and globally distributed data access, and integrating seamlessly with other Azure services like Data Factory and Databricks for efficient ETL processes and real-time analytics.Implementing Snowflake data warehousing solutions tailored to specific needs, ensuring efficient data storage, management, and analytics capabilities.Providing expertise in Snowflake data modeling, schema design, and optimization, enabling valuable insights from their data assets efficiently and effectively.Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, or Azure Data Factory, ensuring reliable and timely data processing and delivery.Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions.Enhanced Spark performance by optimizing data processing algorithms, and leveraging techniques such as partitioning, caching, and broadcast variables.Leveraged Linux Shell scripting for Azure infrastructure management, including deployment, monitoring, and maintenance of data solutions, ensuring robust and scalable operations.Written Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.Implemented Snowflake's Time Travel feature to enable historical data analysis and auditing, ensuring data integrity and regulatory compliance while providing insights into past data states for informed decision-making.Proficient in leveraging PostgreSQL for efficient data storage, management, and querying, ensuring optimal performance and reliability in data pipelines and analytical workflows.Skilled in implementing event-driven architectures using Azure Event Hubs, enabling real-time ingestion, processing, and analysis of streaming data at scale.Implemented DBT (Data Build Tool) for transforming data in the warehouse, creating reusable models, and generating clear documentation and lineage, ensuring efficient and maintainable data transformation workflows.Expertise in administering Teradata environments, including installation, configuration, monitoring, and maintenance tasks, ensuring high availability, reliability, and scalability of the data infrastructure to meet business requirements.Proficiency in configuring linked services, utilizing Copy Activity for data movement, and Lookup Activity for data enrichment and validation within Azure Data Factory.Developed and optimized data processing pipelines using Scala and PySpark within Azure Databricks, leveraging distributed computing capabilities to handle large-scale data transformations, ensuring efficient ETL processes and robust data analytics solutions.Used Spark Streaming to divide streaming data into batches as input to Spark engine for batch processing.Utilized Git and GitHub repositories to maintain the source code and enable version control.Environment: Azure Databricks, Azure Data Factory, Azure Logic Apps, Azure Blob Storage, Azure Cosmos DB, MapReduce, Snowflake, Apache Spark, PySpark, Azure Synapse, Azure Key Vault, Azure Data Lake Storage, Azure SQL Database, DBT, PostgreSQL, Scala, Apache Kafka, Hive, GIT.Client: Johnson & Johnson, Cincinnati, OH. Nov 2018 to Dec 2020Role: Azure Data EngineerResponsibilities: Implemented migration strategies with Azure services for Johnson & Johnson, orchestrating end-to-end data pipelines with Azure Data Factory and Spark. Leveraged Azure Synapse Analytics for real-time insights, Python batch processors, RESTful APIs, and Git for version control, following Agile methodologies for efficient project management.Design & implement migration strategies with Azure suite: Azure SQL Database, Azure Data Factory (ADF) V2, Azure Key Vault, Azure Blob Storage.Created end-to-end Data pipelines using ADF services to load data from On-Prem to Azure SQL server for Data orchestration.Build scalable and reliable ETL pipelines to pull large and complex data from different systems efficiently.Worked on building the data pipeline using Azure Service like Data Factory to load the data from Legacy, SQL server to Azure Data warehouse using Data Factories and Databricks Notebooks.Leveraged Azure Synapse Analytics to streamline data integration, analysis, and reporting processes for Johnson & Johnson, enabling real-time insights and informed decision-making.Utilized Azure Data Lake Storage for scalable and cost-effective storage of structured and unstructured data, enabling Johnson & Johnson to efficiently manage large volumes of diverse data types while ensuring data security and compliance.Developed and optimized PySpark-based ETL pipelines within Azure Databricks to efficiently process and transform large-scale data, ensuring robust and scalable data workflows for real-time analytics and reporting.Implemented Databricks Notebooks for interactive data processing and analytics, enabling seamless integration with Azure Data Factory to orchestrate data loads from legacy systems to Azure Data Warehouse, enhancing data pipeline efficiency and maintainability.Developed Spark applications using PySpark, spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing, and transform.Experience with RESTful API development, including defining resources, handling requests, and returning responses in JSON or XML formats.Expertise you have with building Kafka-based data pipelines, including data ingestion, processing, and analysis.Proficiency with Python libraries commonly used in data engineering, such as Pandas, NumPy, PySpark, and Scikit-learn.Expertise with using Pub/Sub messaging systems in combination with big data technologies, such as Hadoop, Spark.Implemented real-time data processing solutions using Scala and Apache Kafka, enabling efficient data ingestion, transformation, and analysis to support scalable and reliable data pipelines for dynamic analytics and reporting.Experience you have with building CDC pipelines, including designing schema changes, capturing changes in real-time, and applying changes to downstream systems.Developed Python based API (RESTful Web Service) using Flask. Involved in Analysis, Design, and Development and Production phases of the application.Constructed product-usage data aggregations using Py-Spark, Spark SQL and maintained in Azure Data warehouse for reporting, data science dashboarding and ad-hoc analyses.Leveraged Jenkins for continuous integration services and created a Git repository and added the project to GitHub.Involved in AJAX driven applications by invoking web services/API and parsing the JSON response.Utilized Agile process and JIRA issue management to track sprint cycles.Environment: Azure SQL Database, Azure Databricks, Azure Data Factory (ADF) V2, Azure Key Vault, Azure Blob Storage, PySpark, Spark SQL, RESTful API, Kafka, APIs, Python, Hadoop, Scala, CDC pipelines, Jenkins, AJAX Git, GitHub, Agile, JIRA.Client: Guardian, Cincinnati, OH. April 2017 to Oct 2018Role: Big Data DeveloperResponsibilities: Developed and managed robust data pipelines for diverse sources, ensuring security and compliance. Optimized big data processing with Apache Spark, enabling insights presentation through predictive models and interactive dashboards, fostering collaboration and documentation for maintainability.Design and implement efficient data ingestion pipelines to collect structured and unstructured data from various sources, such as customer databases, third-party data providers, social media, and IoT devices.Develop ETL processes to clean, transform, and load the incoming data into the company's data lake or data warehouse.Setting up and maintaining scalable and distributed data storage solutions as Hadoop Distributed File System (HDFS).Ensure data security and compliance with industry regulations, such as HIPAA and GDPR, through access controls, data encryption, and data governance policies.Integrate Oracle Database solutions with data ingestion pipelines to leverage robust relational database management capabilities for structured data storage, querying, and processing.Utilize Oracle's advanced analytics and machine learning features to enhance predictive modeling, data analysis, and business intelligence efforts, driving more informed decision-making.Develop and optimizing big data processing pipelines using frameworks like Apache Spark, Apache Flink, or Apache Kafka.Developed and optimized PySpark scripts and libraries to automate data processing tasks within ingestion pipelines, facilitating seamless integration of diverse data sources and enhancing scalability and efficiency in big data workflows.Developed and optimized Spark SQL queries to enhance data processing workflows, enabling efficient ETL operations and real-time analytics within the Hadoop ecosystem.Develop Python-based scripts and libraries to enhance data processing efficiency and automation within data ingestion pipelines, enabling seamless integration of diverse data sources and facilitating rapid development and deployment cycles.Performing data cleaning, transformation, and analysis tasks on large datasets to extract valuable insights.Developed complex SQL queries and stored procedures to streamline data extraction, transformation, and loading (ETL) processes, integrating with data lakes and warehouses for enhanced data analytics and reporting.Create interactive dashboards and reports using tools like Power BI.Define and implement data governance policies and procedures to ensure data quality, consistency, and lineage.Optimize data pipelines, queries, and algorithms for better performance and scalability.Provisioning and managing cloud resources (e.g., EC2 instances, EMR clusters) for big data workloads.Environment: Apache Spark, Apache Kafka, MapReduce, Python, Hadoop Distributed File System (HDFS), Spark Streaming, Pyspark, Spark SQL, Elastic Compute Cloud, Power BI.Client: Caterpillar, La Vergne, TN. June 2015 to Mar 2017Role: Hadoop DeveloperResponsibilities: Implemented real-time Big Data solutions with Hadoop, Spark, HBase, and Hive for seamless data ingestion and processing. Optimized algorithms with Spark technologies, integrated PySpark for analysis, and facilitated seamless deployment over Kubernetes clusters with CI/CD pipelines.Setup CI/CD pipelines for seamless deployment of applications over Kubernetes cluster spanning development, staging and production environments.Implemented real-time Big Data technologies, utilizing Hadoop, MapReduce frameworks, HBase, and Hive for seamless data ingestion and processing.Played a pivotal role in the real-time migration of Cassandra and the Hadoop cluster, defining and implementing different read/write strategies for optimal performance.Developed Spark streaming applications to enable scalable, high-throughput, fault-tolerant stream processing of real-time data from Kafka, Event Hubs, and other sources.Developed Spark and Python applications for real-time regular expression (regex) projects in the Hadoop/Hive environment, operating seamlessly on Linux and Windows platforms for efficient processing of big data resources.Imported structured data records from relational databases using Sqoop for real-time processing with Spark, storing the data into HDFS in ORC format.Utilized Spark performance tuning techniques to optimize big data workloads, achieving better performance for critical analytics pipelines.Developed and optimized Spark SQL queries for efficient data processing and analytics, enabling seamless integration with the Hadoop ecosystem and improving performance of big data workloads.Developed and deployed PySpark applications to process large-scale data sets with efficiency and reliability, leveraging Python's capabilities within the Spark framework to enhance real-time data processing and analytics.Developed a real-time prototype for Big Data analysis using Spark, RDDs, DataFrames, and the Hadoop ecosystem, supporting real-time processing of CSV, JSON, and distributed files.Utilized advanced SQL queries for efficient data extraction, transformation, and loading (ETL) processes, integrating SQL-based data sources with Hadoop and Spark for seamless real-time analytics and reporting.Applied expertise in real-time partitioning of Kafka messages, configuring replication factors in Kafka Cluster, and implementing real-time reprocessing of failure messages in Kafka using offset id.Developed real-time web interfaces for visualizing and monitoring data using HTML, leveraging AngularJS to build responsive data-rich dashboards and reports.Environment: Hadoop, MapReduce, HBase, Hive, Spark SQL, Python, Kafka, Sqoop, HDFS, ETL, Spark, Pyspark, YARN, Kubernetes, Cassandra, Yarn, Kafka, HTML.Client: Brinks Inc., Houston, TX. Mar 2013 to May 2015Role: ETL DeveloperResponsibilities: Designed and implemented comprehensive ETL processes in SQL Server environments, ensuring data integrity and performance optimization through triggers, complex SQL queries, and stored procedures. Leveraged SSIS extensively, conducted performance tuning, and utilized advanced techniques like window functions for efficient ETL operations.Developed ETL processes by creating new database objects using T-SQL in both Development and Production environments for SQL Server 2008R2.Collaborated in the creation of fact and dimension table implementations within the Star Schema model, aligning with ETL requirements.Utilized persistent staged tables in Teradata for ETL systems to load, replace and recover in case of failures.Employed SQL queries and stored procedures to extract, transform, and load (ETL) data from various source systems into data warehouse, ensuring data accuracy, consistency, and integrity throughout the process.Formulated ETL SQL Queries to extract complex data from various tables in remote databases, employing joins and database links, and formatted the results into reports while maintaining comprehensive logs.Employed SQL Profiler and Query Analyzer to optimize DTS (Data Transformation Services) package queries and stored procedures for efficient ETL processing.Utilized SSIS (SQL Server Integration Services) extensively for seamless ETL operations.Implemented Database Triggers to ensure Data integrity and Referential Integrity as part of the ETL workflow.Implemented Stored Procedures tailored for transforming data, extensively utilizing T-SQL to address various transformation needs during the data loading phase of ETL.Utilized Verticas time-series, temporal capabilities for audit logging and tracking historical data via ETL codes.Created JavaScript functions integrated with Python scripts deployed on Hadoop ecosystem to enable seamless ingestion and processing of real-time data records from numerous sources.Contributed to Data Warehousing efforts by designing and implementing efficient storage structures and optimizing data retrieval processes.Environment: SQL Server 2008R2, T-SQL, PL/SQL, ETL, Vertica, Tera data, SSIS, Star Schema, SQL Profiler, SQL Query Analyzer.Education:Masters in from New England College in the year 2012.
Respond to this candidate
Your Message
Please type the code shown in the image: