Sr Azure Data Engineer Resume Fort wayne...

Sr Azure Data Engineer Resume Fort wayne...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	sr. Azure Data Engineer
Target Location	US-IN-Fort Wayne
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Click here or scroll down to respond to this candidate

AZURE DATA ENGINEERCandidate's Name
E-mail: EMAIL AVAILABLEPhone no.: PHONE NUMBER AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLEProfessional Summary: Highly experienced and proficient Azure Data Engineer with 10+ years of experience in designing, building, and managing scalable data solutions on the Azure platform. Proven expertise in architecting data pipelines, building data lakes, and leveraging Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Gen2, Azure Event Hubs and Azure Blob Storage to unlock actionable insights from data. Spearheaded the development and execution of robust Data Integration and Loading using Azure Data Factory, orchestrating seamless extraction, transformation, and loading (ETL) across diverse data sources.
Integrated Azure Logic Apps into data workflow for seamless automation, allowing for event-driven data processing. Built ETL data pipelines using PySpark, Spark SQL and Scala to ingest, transform and load large volumes of structured and unstructured data. Python, and SQL, with a strong background in building and optimizing data lakes, data warehouses, and ETL processes.
Adept at leveraging Agile methodologies to deliver high-quality, scalable data solutions. Experienced in working with data serialization formats such as Parquet, Avro, and ORC to ensure efficient data storage and retrieval. Expertise in defining roles and privileges for database access, by implementing Identity and Access Management (IAM) and Role-Based Access Control (RBAC) ensuring data security and appropriate access. Implemented Azure Functions, Azure Storage, and Service Bus Queries for large enterprise level ERP integration systems. Experience in streaming applications using Azure Event hubs, Azure Stream Analytics, Azure Synapse Analytics. Experience in creating and managing Azure DevOps tools for continuous integration and deployment (CI/CD) pipelines. Optimized data ingestion, data modeling, data encryption and performance by tuning ETL workflows. Implemented and managed data governance solutions using Azure Key Vault, Azure Active Directory, and Azure Purview to ensure data quality, compliance, and security. Developed reusable Terraform modules for managing Azure networking, allowing rapid configuration changes and consistent security policies. Skilled in creating and managing ETL/ELT workflows with Apache Beam or Apache Airflow to optimize data extraction, transformation and loading processes. Developed data ingestion workflows to read data from various sources and write it to Avro, Parquet, Sequence, JSON, and ORC file formats for efficient storage and retrieval. Expertise in Database Architecture for OLAP and OLTP Applications, Database Designing, Data Migration, and Data Warehousing concepts with emphasis on ETL.
Led the design and deployment of data integration solutions using Azure Data Factory, enhancing data flow and transformation processes. Implemented and managed data integration with Azure API Management, ensuring seamless connectivity between systems and applications. Worked closely with data scientists, analysts, and IT teams to understand requirements and deliver optimized data solutions. Experience building and optimizing large scale data pipelines with Apache Hadoop, Java, HDFS, MapReduce, Hive, and Tez. Experience in using Apache Sqoop to import and export data to and from Relational Database Systems and HDFS.
Optimized Hadoop job execution using Oozie workflows with conditional branching and data dependencies. Experience in optimizing query performance in Hive using bucketing and partitioning techniques. Optimized Spark jobs and workflows by tuning Spark configurations, partitioning and memory allocation settings. Experienced in working with real-time streaming data using Apache Kafka as the data pipeline, and leveraging the Spark Streaming module for data processing. Configured and managed APIs using Azure API Management to ensure secure, scalable access to data and services. Implemented security policies, rate limits, and caching to enhance API performance and reliability. Well versed in using ETL methodology for supporting corporate-wide solution using Informatica. Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Sqoop, Oozie, Flume, Drill Proficient in querying and managing data using both SQL and NoSQL databases for diverse data architectures and mastered distributed database technologies like Cassandra for highly scalable and fault-tolerant data storage. Extensive experience in developing, maintaining, and implementing Enterprise Data Warehouse (EDW), Data Marts, ODS and Data warehouse with Star schema and Snowflake schema. Maintained and administered Git source code repository and Github enterprise. Comprehensive knowledge of Software Development Life Cycle and worked on Agile and Waterfall Methodologies. Collaborated seamlessly with data analysts and stakeholders to implement well-aligned data models, structures, and designs. Dedicated to keeping up with the latest developments and industry best practices in cloud computing and data engineering technologies.Education: Masters in Computer Science from University of Central Missouri. Aug 2012-Dec 2013 Bachelor of Engineering in Computer Science from Osmania University. 2011 HyderabadCertifications: AZ-900 Microsoft Azure Fundamentals DP-203 Microsoft Azure Data Engineer AssociateTechnical Skills:Big Data Technologies: MapReduce, Hive, Tez, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Pig, Zookeeper,
HDFSHadoop Distribution: Cloudera, Horton WorksAzure Services: Azure Data Factory, Azure Data Bricks, Logic Apps, Azure Synapse Analytics, Azure Functions, Azure DevOps, Azure Event Hubs, Azure Data Lake, Polybase.
Languages: SQL, PL/SQL, Python, HiveQL, Scala, PysparkWeb Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAPOperating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.File Formats: CSV, JSON, XML, ORC, Parquet, DeltaBuild Automation tools: Ant, Maven, SBTVersion Control: GIT, GitHub.Methodology: Agile, Scrum.IDE &Build Tools, Design: Eclipse, IntelliJ, Visual Studio.Databases: MS SQL Server 2016/2014/2012, Azure SQL DB, MS Excel, MS Access, Oracle 11g/12c, Cosmos DB, HBaseProfessional Experience:Client: GOLDMAN SACHS, Irving, TX Sep 2021 to present
Role: Azure Data EngineerResponsibilities: Analyzed data from Azure data storages using Databricks and Spark cluster capabilities, extracting valuable insights from large datasets. Developed and maintained end-to-end operations of ETL data pipeline and worked with large data sets in Azure Data Factory; increased data pipeline throughput by 2x. Developed custom activities using Azure Functions, Azure Databricks, and PowerShell scripts to perform data transformations, data cleaning, and data validation. Led the design and implementation of end-to-end ETL processes using PySpark, optimizing data pipelines and reducing data processing times by 40%.
Developed scalable data lakes and data warehouses, enabling efficient storage and retrieval of petabytes of transactional and analytical data. Collaborated with development teams to integrate APIs into applications and workflows, facilitating seamless data exchanges and functionality. Ensured high levels of data quality and integrity through rigorous validation and testing procedures.
Contributed to the implementation of data strategies, including data governance and compliance initiatives.
Monitored and optimized system performance to ensure data integration processes met service level agreements (SLAs). Utilized Apache Spark and Spark SQL to perform complex data transformations and aggregations, improving the accuracy and speed of business reporting. Increased the efficiency of data fetching by using queries for optimizing and indexing. Implemented a tiered data lake architecture with Azure Data Lake Gen 2 (ADLS Gen 2) for batch and streaming data.
Performed ETL tasks against replicated datasets to ensure quality, such as deduplication checks, null handling, and validation rules. Set up near real-time monitoring alerts and automated issue remediation workflows with Azure Monitor; cut incident response time by 50%. Worked with Azure Logic Apps administrators to monitor and troubleshoot issues related to process automation and data processing pipelines. Engineered data pipelines using PySpark and Databricks, resulting in a 30% increase in data processing efficiency.
Integrated various data sources into a unified data lake architecture, supporting the organization s data-driven decision-making processes.
Optimized SQL queries and Spark jobs to reduce computational costs and improve query performance in large-scale datasets.
Worked extensively with Parquet, Avro, and ORC formats to manage and serialize structured and semi-structured data.
Implemented Azure Event Hub for real-time data ingestion, enabling efficient streaming and processing of high-volume data. Utilized data modeling and ETL processes to ensure data accuracy, and consistency with healthcare regulations, including HIPAA.
Experienced in using Parquet, Avro, and ORC for efficient storage and retrieval of large datasets in distributed systems.
Optimized data storage and processing by selecting the appropriate serialization format based on use cases (e.g., Parquet for columnar storage, Avro for schema evolution). Configured and deployed Azure Purview services, including data catalog, data lineage, and data discovery capabilities Enabled encryption-in-transit and at-rest using Azure Key Vault and accessed governance using Azure RBAC and Active Directory groups enabling cross-subscription data sharing while ensuring security compliance. Demonstrated expertise in using Azure Data Catalog to discover, and document various data assets across the organization, including databases, tables, files, and data lakes.
Designed and implemented Spark Streaming solutions for real-time data processing and analysis, enabling immediate insights and actionable intelligence. Leveraged Azure DevOps for continuous integration and deployment (Infor/CD) of data pipelines and applications, streamlining the development and deployment processes. Created a secure network on Azure using NSGs, load balancers, autoscaling, and Availability Zones to ensure 99.95% uptime during peak data loads.
Leveraged Power BI and Azure Analysis Services to deliver interactive dashboards and enable self-service analysis. Configured data pipeline orchestration using YAML pipelines in Azure DevOps, ensuring efficient and reliable execution of data workflows. Participated in Agile ceremonies, including sprint planning, daily stand-ups, and retrospectives, contributing to the continuous improvement of team processes. Configured and optimized Azure HDInsight clusters to meet performance, scalability. Utilized Terraform to automate infrastructure provisioning and management, ensuring consistent and reproducible deployments in an Azure environment. Designed and implemented Directed Acyclic Graph (DAG) workflows to automate complex data processing task. Strong experience in developing Web Services like REST, RESTful APIs and Data Mining using Requests in Python. Collaborated closely with the data engineering team to enhance and optimize data pipelines, improving data processing speed and efficiency. Expertise in processing JSON, Avro, Parquet, ORC and CSV formats for efficient data ingestion transformation and storage.
Actively participated in code reviews, troubleshooting, and performance tuning sessions to improve overall system performance and reliability. Orchestrated Docker containers for various applications, ensuring consistency across development, testing and production environments. Created and maintained HiveQL scripts and jobs using tools such as Apache Oozie and Apache Airflow. Created a Git repository and added Branching, Tagging and Release Activities on GitHub Version Control.
Worked with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation. Experienced Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning.Environment: Azure Databricks, Data Factory, Azure Data Lake Gen 2, Logic Apps, Azure EventHub, Azure HDInsight, Azure Purview, Azure Key Vault, Azure Active Directory, Azure Analysis Services, ELT/ETL, YAML, Containerization, Spark Streaming, Data Pipeline, Terraform, Azure DevOps, Apache Oozie, Apache Airflow, Spark, Hive, SQL, Python, Scala, PySpark, PowerBI, GIT, JIRA, Jenkins, Kafka.Client: STATE OF TX, Dallas, TX Sep 2018 to Aug 2021Role: Azure Data EngineerResponsibilities: Created pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks which decreased the processing time by 20%. Diagnosed and resolved issues related to API connectivity, performance, and integration, leading to a 30% improvement in API reliability and user satisfaction. Established a foundation for real-time analytics with Azure IoT Hubs, Event Hubs to process millions of events per second into a data platform with near-zero latency or data loss.
Designed and implemented data pipelines using Azure EventHub for real-time streaming and processing of large volumes of data. Monitored API performance and implemented enhancements to address latency and reliability issues. Designed and implemented data integration solutions using Azure Data Factory, optimizing ETL processes and enhancing data flow and transformation. Managed the integration of APIs with existing systems, utilizing Azure API Management to enable secure and efficient data access. Executed advanced features of T-SQL to design and tune T-SQL to interface with Azure Synapse Analytics database and other applications in the most efficient manner and created stored procedures for the business logic using T-SQL.
Identified and resolved bottlenecks in data pipelines and Spark jobs to improve system efficiency by 50%.
Optimized database query performance using Python for caching, partitioning, and bucketing.
Extensively worked on Azure Data Lake Analytics with the help of Azure Databricks to implement SCD-1, SCD-2 approaches.
Data ingestion to Azure cloud services like Azure Data Lake, Azure Storage, Azure SQL, Azure DW, and cloud migration by processing the data in Azure Databricks. Utilized Azure Key Vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and in Databricks notebooks. Troubleshooted and Resolved performance issues with Azure Synapse Analytics.
Developed Stream Analytics jobs with anomaly detection rules to identify and respond to issues pre-emptively based on changing conditions across connected systems. Expert in data modeling techniques, optimization of data storage and query performance in Synapse dedicated SQL pools. Familiarity with developing and executing SQL queries, scripts, and stored procedures within Azure Data Studio.
Implemented data backup and disaster recovery strategies for Azure Blob Storage using Azure Backup and Azure Site Recovery. Used NumPy, Pandas, SciPy, and Pytables for ad-hoc analysis, data cleansing and preprocessing during development lifecycle. Proficient in deploying scalable ML models via Azure ML Service, including containerization, Azure Kubernetes Service (AKS), Azure Container Instances (ACI) managing versions, endpoints, and deployments. Deployed metadata tagging standards throughout Azure Data Factory, allowing end-to-end lineage tracking of transformed data to trusted sources. Experienced in using Azure Data Share to enable seamless collaboration and data exchange between cross-functional teams. Implemented DevOps culture utilizing GitHub workflows alongside Azure boards and blockages to increase data and site reliability engineering collaboration. Writing Spark and Spark SQL transformation in Azure Databricks to perform complex transformations for business rule implementation.
Leveraged Azure Cosmos DB and Azure Blob Storage to efficiently store and manage large volumes of IoT data with high availability and scalability.
Streamlined PolyBase in Azure environments for seamless integration and querying across diverse data sources, optimizing data engineering workflows. Performed data analysis and profiling to identify data patterns, anomalies, and insights for business decision-making. Incorporated data governance and security measures to ensure compliance with industry regulations and protect sensitive data. Conducted performance tuning and optimization of data pipelines and queries to improve overall system efficiency. Provided technical guidance and support to team members, fostering knowledge sharing and partnership. Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities. Developed RDD s & Data frames (SparkSQL) using PySpark for analysing and processing the data. Leveraged Spark Streaming and Azure Functions to divide streaming data into batches as an input to Spark engine for batch processing. Utilized JIRA for error and issue tracking and added several options to the application to choose a particular algorithm for data and address generation.
Used Git and Gitlab as version control tools to maintain the code repository while managing project tasks and issues.Environment: Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Spark, Azure Stream Analytics, Azure Machine Learning, Azure Functions, Poly Base, Azure Kubernetes Service, Azure Backup, Azure Site Recovery, Azure Container Instances, Azure Data Studio, Azure Key Vault, Azure IoT Hubs, Azure Data Lake, PolyBase, NumPy, PySpark, MapReduce, Spark, Hive, JIRA, Git, Gitlab.Client: MACY'S, New York, NY Jan 2016 to Aug 2018Role: Big Data EngineerResponsibilities: Utilized Sqoop to import data from MySQL to Hadoop Distributed File System (HDFS) on a regular basis, ensuring seamless data integration. Performed aggregations on large volumes of data using Apache Spark and Scala, and stored the processed data in Hive warehouse for further analysis. Managed Data Lakes and big data ecosystems, using Hadoop, Spark, to leverage their capabilities for efficient data processing. Integrated Parquet, Avro, and ORC with Hadoop, Spark, and Hive for scalable data processing and analysis. Implemented data pipelines leveraging these formats to enhance performance in ETL processes. Utilized Parquet, Avro, and ORC to reduce storage footprint and improve I/O efficiency through advanced compression techniques.
Configured and tuned serialization settings to achieve optimal balance between storage efficiency and processing speed. Migrated huge data from (i.e., Netezza, Oracle, SQL Server) to Hadoop.
Successfully loaded and transformed large sets of structured, semi-structured, and unstructured data, enabling effective analysis and insights generation. Developed Hive queries to analyze data and meet specific business requirements, utilizing Hive Query Language (HiveQL) to simulate MapReduce functionalities. Built HBase tables by leveraging HBase integration with Hive on the Analytics Zone, facilitating efficient storage and retrieval of data. Standardized fault-tolerant and scalable data processing solutions by leveraging technologies such as Apache Spark. Applied Kafka and Spark Streaming to process streaming data in specific use cases, enabling real-time data analysis and insights generation. Designed and implemented a data pipeline using Kafka, Spark, and Hive, ensuring seamless data ingestion, transformation, and analysis. Applied data visualization techniques and designed interactive dashboards using Power BI to present complex reports, charts, summaries and graphs to team members and stakeholders. Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes, ensuring data accuracy and quality. Experienced in loading logs from multiple sources into HDFS using Flume. Utilized JIRA to manage project workflows, track issues, and collaborate effectively with cross-functional teams. Implemented Spark using Python (PySpark) and Spark SQL for faster data testing and processing, enabling efficient data analysis and insights generation. Experience in integrating Apache Yet Another Resource Negotiator (YARN) with other Apache ecosystem tools and frameworks. Demonstrated expertise utilizing Apache Flink to create batch and real-time stream processing systems. Employed Spark Streaming to divide streaming data into batches as input to the Spark engine for batch processing, facilitating real-time data processing and analytics. Utilized Zookeeper for coordination, synchronization, and serialization of servers within clusters, ensuring efficient and reliable distributed data processing. Spearheaded Oozie workflow engine for job scheduling, enabling seamless execution and management of data processing workflows. Utilized Data Analysis Expressions (DAX) to create complex calculations, measures, and calculated columns within PowerBI. Optimized Power Apps solutions for performance and scalability, including minimizing data latency. Utilized Kubernetes for container orchestration and scheduling, dynamic resource allocation, and automated deployment of data processing application Leveraged Git as a version control tool to maintain code repositories, ensuring efficient collaboration, version tracking, and code management. Responsible for triggering the jobs using the Control-M.
Environment: Sqoop, PL/SQL, HDFS, Cloudera, Horton Works, Netezza, Hive Query Language, Apache Spark, Apache Flink, Apache YARN, Scala, Hive, Hadoop, HBase, Flume, Kafka, MapReduce, Zookeeper, Oozie, RDBMS, DAX, Python, Power Apps, Control-M, Kubernetes, PySpark, Git, JIRA, PowerBI.Client: ANTHEM, Chicago, IL Jan 2014 to Dec 2015Role: Data warehouse Developer
Responsibilities:
Extensively used Informatica client tools such as Source Analyzer, Warehouse Designer, Mapping Designer, and Mapplets Designer. Analyzed, designed, constructed, and implemented ETL jobs using Informatica. Leveraged Apache NiFi to orchestrate complex ETL workflows, facilitating seamless data flow from various sources into data warehouse. Developed mappings/ transformations/ mapplets by using a mapping designer, transformation developer, and mapplets designer in Informatica Power Center 8.x. Designed and established data models using Star Schema and Snowflake Schema ensuring data is organized efficiently. Used Informatica Power Center and Power Exchange for extracting, transforming, and loading data from relational and non-relational sources.
Proficient in using Visual Studio as an Integrated Development Environment (IDE) for developing and maintaining data engineering solutions. Expertise in designing ETL data flows using SSIS, creating mappings and workflows for extracting data from SQL Server, as well as performing data migration and transformation from Access/Excel sheets using SQL Server SSIS. Performed data enrichment, aggregation, and cleansing using SSIS data flow transformations such as Derived Column, Lookup, Conditional Split, Aggregate. Proficient in dimensional data modeling for Data Mart design, identifying facts and dimensions, and developing fact tables and dimension tables using Slowly Changing Dimensions (SCD) techniques. Well-versed in integrating Talend with various data warehouses and databases such as Oracle, SQL Server, MySQL, Teradata, Snowflake, etc. Proficient in orchestrating and scheduling Talend jobs using Talend Administration Center (TAC) or other scheduling tools. Experienced in building cubes and dimensions with different architectures and data sources for business intelligence purposes, including writing MDX scripting. Enhanced teamwork and communication by employing SharePoint functionalities including discussion boards, team calendars, and announcements. Skilled in optimizing OLAP and OLTP tables for query performance and data processing efficiency. Proficient in developing SSAS SQL Server Analysis Services cubes, implementing aggregations, defining KPIs (Key Performance Indicators), managing measures, partitioning cubes, and creating data mining models. Deploying and processing SSAS objects. Expertise in developing parameterized, chart, graph, linked, dashboard, scorecards, and Drill-down/Drill-through reports on SSAS cubes using SSRS (SQL Server Reporting Services). Designed basic Unix scripts and automated them to run the workflows daily, weekly, and monthly. Executed various transformations such as Source Qualifier, Expression, Lookup, Sequence Generator, Aggregator, Update, Strategy and Joiner while migrating data from various heterogenous sources like Oracle DB2, XML, and flat files. Tested data integrity among various sources and targets. Associated with the production support team in various performance related issues and enhanced the performance by 45%. Hands-on experience in data compression techniques to reduce storage space, and use data replication strategies for data redundancy.
Adept at shell scripting (Bash or PowerShell) for automating data processing tasks, including data ingestion.
Familiarity with data modeling tools like ER Studio or Erwin.
Involved in doing Unit Testing, Integration Testing, and System Testing, Source Controlling, environment specific script deployment tracking using Team Foundation Server (TFS).
Configured Maven and SBT build scripts to automatically download and manage project dependencies. Excellent T-SQL programming skills including complex Stored Procedures, Views, User-defined Functions, Triggers. Cursors, Table Variables, Common Table Expressions (CTE) and Windowing Functions.Environment: Informatica Power Center 8.x, Visual Studio, Star and Snowflake schema, Informatica Power Exchange, Oracle DB2, Unix shell scripts, SQL, T-SQL, Team Foundation Server (TFS), ER Studio, Erwin, Apache NiFi, SSIS, SSAS, SSRS, SCD techniques, Talend, TAC, MDX scripting, OLAP and OLTP tables, SharePoint, Teradata, MySQL, Maven, SBT.

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: