| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Senior Azure Data EngineerPhone: PHONE NUMBER AVAILABLEMail: EMAIL AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLESUMMARY:Over 10 years of IT experience with expertise in Designing and implementing solutions for complex business problems, successfully migrated local data infrastructures to Azure Cloud Services, utilizing Big Data Technologies and the Azure Tech Stack to ensure data quality and integrity and also I involved in designing and implementing Azure-based solutions for complex business problems, successfully migrating on premises to Azure Cloud Services, leveraging Azure Data Factory, Azure Synapse Analytics, and Azure Databricks.Proficient in designing and orchestrating end-to-end data pipelines using Azure Data Factory, integrating with Azure Blob Storage, Azure Data Lake Storage Gen2, and Azure Synapse Analytics for seamless data movement and transformation.Expertise in designing cloud-based data warehouse solutions using Snowflake on Azure, optimizing schemas, tables, and views for efficient data storage and retrieval, and leveraging Azure Data Factory for data integration.Implemented scalable data storage solutions using Azure Blob Storage and Azure Data Lake Storage Gen2, optimizing data accessibility, reliability, and performance, while ensuring data governance and security using Azure Purview.Integrated Azure Logic Apps into data workflows for seamless automation, allowing for event-driven data processing, improved efficiency, and enhanced data integration across Azure services like Azure Data Factory and Azure Functions.Developed and implemented Azure Databricks and PySpark for advanced data transformations and processing, enabling efficient handling of complex data workflows, and integrating with Azure Data Factory and Azure Synapse Analytics for end-to-end data solutions.Enhanced data warehousing on Snowflake on Azure, ensuring scalability, multi-cloud flexibility, secure collaboration, time travel, versioning, and seamless integration with Azure Data Factory and Azure Synapse Analytics.Spearheaded the migration of legacy on-premises data systems to the Azure cloud, using a phased approach, automated deployment pipelines, and Azure Data Factory for a seamless transition, minimizing business disruption, and optimizing infrastructure costs.Architected a modern cloud data platform using Azure storage, compute, and analytics services, including Azure Data Lake Storage Gen2, Azure Synapse Analytics, and Azure Databricks, delivering cost savings and preserving on-premises capabilities.Implemented a real-time data ingestion and processing solution using Azure Event Hubs, Azure Functions, and PySpark on Azure Databricks for rapid analysis of streaming data, enabling time-sensitive use cases like fraud detection and supply chain optimization.Demonstrated deep expertise in Azure Functions for serverless computing, enabling the execution of event-triggered data processing tasks, enhancing flexibility, and seamlessly integrating with Azure Event Hubs and Azure Synapse Analytics.Implemented and managed Apache Hadoop and Spark clusters on Azure HDInsight for scalable and efficient big data processing, leveraging Azure Data Lake Storage Gen2 for data storage and Azure Monitor for cluster monitoring.Proficient in Azure Synapse Analytics for large-scale data processing and analytics, delivering high-performance and comprehensive data insights, and integrating with Power BI for interactive data visualization and reporting.Enabled interactive data exploration and ad-hoc analysis with Azure Synapse Analytics, leveraging serverless SQL pools and integrating with Power BI for visually appealing reports and dashboards, empowering business users with self-service analytics.Established comprehensive data governance across diverse sources using Azure Purview, ensuring data lineage, access control, compliance, and data discovery, while simplifying data management within the Azure ecosystem.Implemented Azure Synapse Analytics' serverless SQL pools for ad-hoc, interactive data exploration and analysis, empowering business users to derive insights from data lakes without complex data modeling or ETL processes.Integrated Azure Synapse Analytics with Power BI to create a comprehensive, end-to-end business intelligence solution, delivering self-service reporting and dashboarding capabilities, and enabling data-driven decision-making across the organization.Designed a multi-layered data lake architecture on Azure Data Lake Storage Gen2, incorporating best practices for data partitioning, versioning, access control, and data lifecycle management, supporting growing analytical and AI/ML workloads.Possess deep expertise in writing scripting languages like Python and Scala for efficient data manipulation and processing tasks.Good hands-on experience on in the Hadoop ecosystem, proficient in managing HDFS and Yarn, orchestrating data processing with MapReduce, and developing complex analytics with Apache Spark.Proficient in data ingestion using Sqoop and Flume; cluster coordination with Zookeeper; data management and querying with Hive, Pig, and HBase; event streaming with Kafka; and efficient workflow scheduling with Oozie.Experience in Database Architecture for Relational OLAP and OLTP Applications, Database Designing, Data Migration, and Data Warehousing Concepts with emphasis on ETL and Configured MOLAP storage and partitioning strategies to optimize performance and maintain data consistency across large datasets.Experience in Data Modeling and Data Analysis using Dimensional and Relational Data Modeling.Defined user stories and drove the Agile board in JIRA during project execution, participated in sprint demos and retrospectives.Maintained and administered GIT source code repository and GitHub Enterprise.Collaborated closely with data analysts and stakeholders to implement effective data models and structures aligned with business needs.ADDITIONAL INFORMATION:Cloud Computing : Azure Data Factory(ADF), Azure Data Lake Storage(ADLS), Azure Synapse Analytics (SQL Data warehouse), Azure SQL Database, Azure Data bricks, Polybase, Azure Cosmos DB, Azure Key vaults, Azure DevOps, Functional Apps, Logic Apps, Azure Preview.Languages : Python, SQL, HiveQL, PySpark, Scala, PIGDatabases : MYSQL, Oracle, MS-SQL Server, Teradata, PostgreSQL, MongoDB.BIG Data : SQOOP, Hive, HBase, Flume, Hadoop, Kafka, Apache Spark, Horton Works, Cloudera, Keycloak.CI/CD Tools : Terraform, Apache Airflow, JenkinsVersion Control : Git, BitbucketAutomation Tools : Maven, SBTFile Formats : CSV, JSON, XML, ORC, Parquet, DeltaOther Tools : Visual Studio, SQL Navigator, SQL Server Management Studio, Eclipse, PostmanWORK EXPERIENCE:Sr. Azure Data Engineer Jul 2022 CurrentState of New York, Albany, NY.Responsibilities:Implemented data processing workflows in Azure Data Factory pipelines, leveraging activities such as Copy to seamlessly move data between diverse sources and destinations, filter to extract specific subsets of data based on defined conditions, and for each to automate iterative tasks, optimizing data processing efficiency and enhancing workflow automation.Developed end-to-end ETL processes in Azure Data Factory to migrate data from diverse sources into Azure Synapse Analytics, increasing data pipeline throughput by 2xImplemented robust error handling, logging, and alerting mechanisms within Azure Data Factory pipelines and Azure Functions, ensuring proactive issue detection and resolution through centralized monitoring and automated notifications.Leveraged Azure Databricks to transform raw data into business-ready formats and Implemented Delta Lake architecture on Azure Databricks for efficient data lake storage and ensured ACID compliance for data operations.Utilized PySpark within Azure Databricks for advanced data transformations, demonstrating expertise in efficiently handling large-scale data processing tasks.Executed a proof of concept (POC) on Microsoft Fabric, showcasing its potential for creating unified data pipelines, integrating advanced analytics, and improving data governance across cloud environments.Experience in Implemented Microsoft Fabric to unify data management and analytics across on-premises and cloud environments, improving data accessibility and consistency.Architected a modern cloud data platform leveraging Azure storage, compute, and analytics services, delivering TCO savings of about 10% while fully preserving existing on-premises capabilities.Designed and implemented data pipelines on the Databricks Lakehouse Platform, enabling efficient ingestion, transformation, and analysis of large-scale datasets across structured and unstructured data.Loaded transformed data into curated datasets within Synapse Analytics with partitioning and clustering based on expected query patterns for optimized data access and performance.Enabled interactive data exploration and ad-hoc analysis with Azure Synapse Analytics and integrated with Power BI for visually appealing reports and dashboards to share insights with stakeholders.Implemented a high-throughput messaging solution using Azure Service Bus, processing over 1 million messages at peak time with 99.9% reliability, and enabling seamless communication between micro services in a distributed architecture.Optimized message routing and load balancing in Azure Service Bus, reducing message processing latency by 40% and improving overall system performance in a large-scale data integration project.Leveraged Azure Data Lake Storage (ADLS) Gen2 as a scalable or efficient data movement and processing within ETL pipelines, enabling seamless integration and data management across diverse Azure services.Leveraged the advanced features of Azure Data Lake Storage (ADLS) Gen2, including its hierarchical namespace and fine-grained access control, to enforce robust data governance policies and ensure regulatory compliance within the ETL project.Established comprehensive data governance across diverse sources via Azure Purview, ensuring data lineage, access control, and compliance while simplifying data discovery and trust.Developed and implemented complex SQL queries and stored procedures to extract, transform and load data into data warehouses and data marts.Configured Unity Catalog to enable role-based access control (RBAC) and dynamic access policies, ensuring proper permissions and data visibility for different user groups in line with organizational policiesImplemented data security measures using .NET and Azure Key Vaults to securely manage and encrypt sensitive information.Implemented ETL process wrote and optimized SQL queries to perform data extraction and merging from SQL server database.Automated data classification and labeling across diverse sources with Azure Purview, ensuring consistent data protection and regulatory adherence.Enhanced infrastructure deployment processes by implementing best practices in IaC, reducing errors, and increasing the reliability of production environments.Enhanced data warehousing on Snowflake on Azure, ensuring scalability, multi-cloud flexibility, secure collaboration, time travel, versioning, and integration, including star schema & snowflake schema design for efficient analytics.Designed and deployed Snowflake stages to seamlessly import data from diverse sources, proficiently managing transient, temporary, and persistent Snowflake tables to support efficient data processing workflows.Leveraged connectors to seamlessly integrate Logic Apps with diverse data sources, APIs, and cloud services, enabling efficient data exchange and triggering actions based on events.Centralized data management using Unity Catalog for structured and unstructured data, streamlining the discovery, classification, and governance of data assets across multiple environments.Built RESTful APIs using ASP.NET Core to enable seamless data access and interaction for frontend applications and other services.Extensively worked on Event Hubs for event-driven architectures, real-time data processing via Stream Analytics, Functions, and data retention via Capture to Blob and Data Lake Storage.Leveraged Azure Key Vault for secure storage and retrieval of sensitive credentials, Azure Monitors for comprehensive monitoring and alerting, and Apache Airflow for orchestrating complex data pipelines.Enabled centralized identity and access management through Azure Active Directory integration, and adopted DevOps practices with Azure DevOps for version control, continuous integration, and automated deployments.Integrated Active Directory (AD) and Azure Entra ID into data engineering solutions, ensuring secure user authentication and access management across on-premises and cloud environmentsDesigned and deployed Spark applications in Python, leveraging PySpark's resilient distributed datasets (RDDs) and DataFrame APIs to handle large-scale data processing tasks, ensuring high throughput and fault tolerance.Developed and executed efficient PySpark scripts for processing datasets, incorporating SparkSQL for structured data analysis, and creating DAGs for workflow orchestration.Built data processing pipelines using Python, leveraging libraries such as Pandas and NumPy for data manipulation and analysis.Created interactive Power BI dashboards and reports using Data Analysis Expressions (DAX), effectively visualizing key metrics, and communicating trends for improved decision-making.Designed and deployed advanced reporting functionalities in Power BI using DAX, including Drill-through and Drill-down reports with interactive Drop-down menus, data sorting capabilities, and subtotals for enhanced data analysis.Extensively worked on UNIX, LINUX, and shell scripting, leveraging PowerShell for efficient data manipulation, automation, and scripting to drive efficiency and automation.Documented Terraform workflows and IaC practices, providing detailed guidelines for the team to manage cloud infrastructure efficiently and securely.Developed and maintained Terraform scripts for provisioning Azure infrastructure, enabling consistent and repeatable deployments across multiple environments.Implemented CI/CD pipelines using Jenkins and Git hooks for automated code integration, testing, and adherence to coding standards, streamlining agile development processes. Conducted sprint reviews and retrospectives to drive continuous improvement in delivering data solutions efficiently.Environment: Azure Data Lake Storage gen2, Azure Data Factory, Azure Purview, Azure Event Hubs, Azure SQL Server, Azure Synapse Analytics, Azure Blob Storage, Microsoft Fabric, Azure Key Vault, Azure Logic Apps, Azure Functional Apps, Azure Analytic Services, Azure Service Bus, Snowflake Database, Oracle Databases, PySpark, Scala, Python, Spark SQL, Snow SQL, Power BI, GitHub, Agile methodology, JIRA.Azure Data Engineer and Production support Engineer Sept 2017- Jun 2022Brillio, Dallas, TX.Responsibilities:Developed Spark applications using PySpark, Spark-SQL, and Scala on Databricks, performing data transformations, conversions, and aggregations to meet functional requirements.Leveraged Azure Databricks and PySpark to develop advanced data processing workflows to identify patterns and anomalies in large, complex datasets, driving data-driven decision-making for mission-critical business initiatives.Designed and deployed a multi-tenant data platform on Azure Synapse Analytics and Azure Databricks, providing self-service analytics capabilities to business users while maintaining robust security, access control, and data isolation measures.Utilized Azure Data Factory to orchestrate complex, end-to-end data workflows that seamlessly integrated on-premises data sources with cloud-based data services, such as Azure Databricks and Azure Synapse Analytics, enabling enterprise-wide data modernization.Implemented a real-time data ingestion and processing solution using Azure Event Hubs, Azure Functions, and PySpark on Azure Databricks, enabling the rapid analysis of streaming data to support time-sensitive use cases like fraud detection and supply chain optimization.Leveraged the native integration between Azure Data Factory and Azure Databricks to develop highly efficient, scalable data processing pipelines that combined the strengths of both platforms, delivering superior performance and flexibility for the organization's advanced analytics initiatives.Developed a self-service data preparation and exploration solution using Power BI and Azure Databricks Notebooks, empowering business users to access, blend, and analyze data from multiple sources without relying on IT support, accelerating the pace of data-driven innovation.Designed and implemented robust ETL processes to extract data from Salesforce, transform it according to business requirements, and load it into the ADF.Implemented a secure, multi-tenant data platform architecture on Azure Synapse Analytics and Azure Databricks, incorporating Azure Active Directory for user authentication and authorization, data masking, and row-level security to govern access to sensitive information.Implemented a robust data masking and anonymization strategy utilizing Azure Data Factory and Azure Databricks, safeguarding the privacy and security of sensitive information while maintaining data utility for analytical purposes.Designed and implemented a highly scalable data processing pipeline using Azure Data Factory to ingest, transform, and load data from disparate sources, including on-premises systems and cloud-based SaaS applications, enabling enterprise-wide data consolidation and harmonization.Designed and implemented advanced data quality monitoring and alerting mechanisms within the Azure Data Factory pipelines, leveraging Azure Monitor and custom data quality rules to proactively identify and address data anomalies, ensuring the reliability of mission-critical data.Designed and deployed a centralized data monitoring and alerting solution using Azure Data Factory, Azure Monitor, and Azure Logic Apps, proactively identifying data processing delays, pipeline failures, and other anomalies to ensure the reliability and timeliness of data deliveries.Architected a centralized, high-performance data warehouse using Azure Synapse Analytics, optimizing database schema designs, indexing, and SQL query performance to support sub-second query response times for business intelligence and analytical workloads.Conducted detailed root cause analysis on recurring data issues, implementing long-term solutions to prevent future incidents and reduce system downtime.Designed and implemented a robust ETL process, extracting data from Salesforce, transforming it based on business requirements, and loading it into Azure Data Factory (ADF), enabling data-driven decision-making.Empowered business users with ad-hoc data exploration through Azure Synapse Analytics' serverless SQL pools, while seamlessly migrating on-premises workloads to Azure Synapse Analytics using Pathway, optimizing cloud-native data analytics.Integrated Azure Synapse Analytics with Power BI to create a comprehensive, end-to-end business intelligence solution, delivering self-service reporting and dashboarding capabilities to business stakeholders and enabling data-driven insights.Deployed Snowflake on Azure, utilizing Snow SQL for scalable data querying and management, enhancing analytics with features like automatic scaling and native support for semi-structured data.Enhanced data warehousing on Snowflake on Azure, ensuring scalability, multi-cloud flexibility, secure collaboration, time travel, versioning, and integration, including star schema & snowflake schema design for efficient analytics.Utilized Python data science libraries, such as Pandas, NumPy, and SciPy, to perform sophisticated data manipulation, cleaning, and feature engineering tasks on massive datasets stored in Azure Data Lake Storage (ADLS), enabling the delivery of actionable insights to stakeholders.Integrated Azure Logic Apps and Azure Functions to create a serverless, event-driven data processing pipeline that automatically triggered data ingestion, transformation, and load processes in response to real-time data events, ensuring timely availability of mission-critical information.Automated infrastructure provisioning using Azure services, improving deployment speed and consistency across development and production environmentsDeveloped a comprehensive data governance framework within the Azure ecosystem, leveraging SQL data modeling techniques, data cataloging, and lineage tracking to ensure the reliability, integrity, and compliance of enterprise data assets.Implemented a scalable, fault-tolerant data ingestion architecture using Azure Event Hubs and Azure Functions, providing a highly available and resilient platform for capturing real-time data streams from IoT devices, web applications, and other distributed sources.Designed and implemented a multi-layered data lake architecture on Azure Data Lake Storage Gen2, incorporating best practices for data partitioning, versioning, and access control to support the organization's growing analytical and AI/ML workloads.Collaborated closely with cross-functional teams, including business analysts, domain experts, and IT stakeholders, to align data engineering initiatives with the organization's strategic objectives and ensure the delivery of impactful data solutions.Provided technical leadership and mentorship to junior data engineers, sharing best practices, conducting code reviews, and fostering a culture of continuous learning and innovation within the data engineering team.Developed and implemented automated workflows using Power Apps and Power Automate, significantly improving the efficiency and responsiveness of data-driven processes, reducing manual effort, and minimizing errors.Integrated OAuth 2.0 and OIDC with ETL processes, ensuring secure and seamless access to external data sources while maintaining strong authentication protocolsDesigned and maintained GraphQL schemas to create standardized query patterns, providing a robust and adaptable API layer for front-end developers.Stayed abreast of the latest advancements in Azure data services, data engineering methodologies, and industry trends, consistently evaluating new technologies and approaches to enhance the organization's data capabilities.Spearheaded the migration of legacy on-premises data systems to the Azure cloud, leveraging a phased approach and automated deployment pipelines to ensure a seamless transition, minimize business disruption, and optimize infrastructure cost.Designed and structured Confluence spaces, pages, and permissions, aligning with organizational needs and access controls.Optimized Git workflows for efficient branching, merging, and conflict resolution, minimizing disruptions and enhancing team productivity.Orchestrated sprint planning sessions, creating backlogs and assigning tasks to team members based on capacity and skillsets.Environment: Azure Databricks, Azure Data Factory, Azure SQL, Azure Functions, Azure Event Hubs, Azure Logic Apps, Azure Key Vault, Azure Synapse Analytics, Snowflake, Microsoft Power BI, Azure Monitor, Snow SQ, MS SQL, MongoDB, SQL, Python, Scala, PySpark, shell scripting, GIT, Jenkins, ADF Pipeline, Confluence.Big Data Developer Dec 2015 - Aug 2017Kroger, Cincinnati, Ohio.Responsibilities:Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats to uncover insights into customer usage patterns.Migrated data from different databases (Netezza, Oracle, SQL Server) to Hadoop and developed Java-based code to build services.Managed code releases by running functional test suites, fixing bugs, and coordinating with cross-functional teams.Imported and exported data between HDFS and Teradata using Sqoop, and managed data from different sources.Analyzed Hadoop cluster and various Big Data Analytic Tools, including Pig, Hive, HBase, Spark, and Sqoop, and managed Zookeeper configurations for optimal performance in a distributed environment.Developed Apache Spark Jobs using Scala and Spark SQL, migrated HiveQL queries to Spark SQL, and analyzed data using Hive and Pig.Wrote Oozie Flows, Shell Scripts, and MapReduce code for automation and log data parsing, and optimized jobs using compression techniques.Configured Jenkins pipelines for continuous integration and delivery, and orchestrated containerized applications using Kubernetes.Created and optimized Hive tables, converted SQL queries to HiveQL, and developed UDFs and Oozie workflows for ETL processes.Integrated SSO solutions using Keycloak across web applications, ensuring a unified authentication process for users via SAML, OAuth2, and OpenID Connect protocols.Utilized Spark Streaming with Kafka for near-real-time data processing, implemented Spark RDD transformations and actions, and used Data Analysis Expressions (DAX) in Power BI for complex calculations and measures.Configured Bitbucket for centralized source code management, leveraging features like pull requests, branching strategies, and issue tracking.Adapted to an agile development environment, actively participating in daily stand-ups, sprint planning, and retrospective meetings to foster collaboration and continuous improvement.Environment: Hadoop, Spark, Hive, Pig, Sqoop, HBase, Oozie, Ambari, Oracle, Netezza, SQL Server, Teradata, Kafka, Trifacta, Scala, Java, PySpark, Spark SQL, Spark Streaming, Spark RDD, HDFS, YARN, MapReduce, UNIX script, Oracle 8.0, SQL, PLSQL, MS Access, SQL Server, Informatica, MS Excel.ETL Developer Jan 2014 Nov 2015JPMC, Dallas, Tx.Responsibilities:Gathered requirements from Business System Analyst, prepared design documents, and worked with T-SQL (DDL, DML) to create complex stored procedures, views, tables, user-defined functions, indexes, and relational database models.Designed and created staging databases for data processing, consolidation, and loading into central repositories using SQL Server Integration Services (SSIS).Developed ETL packages with various data sources (SQL Server, Flat Files, Excel, Oracle, DB2) and performed transformations using SSIS packages.Optimized T-SQL queries using SQL Server Profiler and SSMS, and implemented and maintained Postgres databases for optimal performance and reliability.Generated reports using SQL Server Reporting Services (SSRS) with features like charts, drill-down, sub-reports, cascading, and parameterized reports.Developed data integration solutions utilizing Informatica Power Center mappings and SSIS packages, optimizing performance, handling errors, and ensuring data quality across sources while collaborating with stakeholders.Created SSIS packages for data loading from Flat Files, Excel, and XML Files to Data Warehouse and Report-Data Mart using various transformations (Lookup, Derived Columns, Sort, Aggregate, Pivot, Slowly Changing Dimension).Developed SCD-type mappings for data loading from source to target, Dimension tables, and Facts, and migrated data from MS Access to SQL Server.Collaborated with offshore and QA teams for coding, unit testing, and end-users for UAT and production deployment.Worked on production support activities, identifying and translating business requirements into logical and physical data models, and collaborated with development teams for code quality, debugging, and version control within Visual Studio.Conducted ETL performance tuning, troubleshooting, support, and capacity estimation, worked with .NET developers for UAT testing, and created dashboards and reports using Tableau.Skilled in following the Waterfall software development life cycle (SDLC), including requirement gathering, design, implementation, testing, and deployment phases.Environment: SQL Server, Oracle, SQL Server Management Studio, MVC, Team Foundation Server(TFS), T-SQL, SSIS, SSRS, MS Visual Studio 2008/2010, MS Access, MS Office, Tableau, Footprints and SharePoint.EDUCATION:Bachelors of Technology Electronics and Communication Engineering, JNTUH, India. May 2011.Masters in Computer Science University of Memphis, Dec 2013CERTIFICATIONS:Azure Fundamentals (DP-900)Azure Data Engineer Associate (AZ-203) |