| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Azure Data EngineerPHONE NUMBER AVAILABLEEMAIL AVAILABLEPROFESSIONAL SUMMARY:Certified Azure Data Engineer with 5+ years of experience in designing and implementing scalable data ingestion pipelines using Microsoft Azure Cloud, Python, PySpark, Big Data.Hands on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.Proficient in leveraging Azure Databricks and Spark for distributed data processing, manipulation, and transformation tasks using Python and SQL.Implemented Azure Data Factory to orchestrate seamless data workflows, optimizing data processing efficiency.Expertise storing and retrieving unstructured and semi-structured data with Azure Blob Storage.Demonstrated expertise in utilizing Azure Event Hub for real-time streaming data ingestion.Proficient in utilizing Azure Synapse Pipelines for orchestrating and managing complex data integration and transformation workflows.Created end-to-end data workflows with serverless solutions, Azure Functions, and Azure Logic Apps.Integrated BI tools such as Tableau, Power BI with SQL databases and data warehouses, developing data connectors and pipelines to enable interactive data visualization, dashboarding.Adept at designing cloud-based data warehouse solutions using Snowflake on Azure, optimizing schemas, tables, and views for efficient data storage and retrieval.Shown expert-level proficiency in using Snow SQL to retrieve and manipulate large datasets in Snowflake data warehouses.Developed, enhanced, and maintained Snowflake database applications, including crafting logical and physical data models, incorporating necessary changes and improvements.Experience with Snowflake Multi - Cluster Warehouses, Snowflake clone and Time Travel.Expertise working with AWS cloud services like EMR, S3, EMR cloud watch, for big data development.Develop and maintain ETL workflows using AWS Glue to move data from various source systems to AWS Redshift.Developed data pipelines using AWS services like Lambda, Kinesis Streams and SQS.Implemented complex data integration solutions using Informatica IICS, orchestrating the extraction, transformation, and loading (ETL) processes for structured and unstructured data sources.Proficient in languages like Python and Scala, enabling seamless integration of custom functionalities into data pipelines.Leveraged technologies like Apache Airflow and custom-built orchestration frameworks to ensure seamless data movement and synchronization.Proficient in designing and developing DWH solutions, architecting ETL strategies, and utilizing SQL, PySpark, and SparkSQL for data manipulation and analysis.Implemented optimized indexing strategies in PostgreSQL.Leveraged Terraform to automate the enforcement of security policies and compliance standards for Azure data services, integrating Azure Policy and Azure Role-Based Access Control (RBAC) within infrastructure code.Proficiently utilized Docker for containerization and Kubernetes for orchestration to streamline the development, deployment, and management of scalable pipelines and applications.TECHNICAL SKILLS:Azure Cloud ServicesAzure Data Lake, Azure Data Factory, Azure Databricks, Application Insights, Key Vault, AzureBlobStorage, EventHub, Logic Apps, Functional Apps, Snowflake.Big Data TechnologiesHDFS, Yarn, Map Reduce, Hive, SQOOP, Flume, HBase, PySpark, Kafka, Oozie, Zookeeper.Web TechnologiesHTML, CSS, XML, JDBC, JSP, RestAPI, JavaScript.DatabasesMy SQL Server, Teradata, Oracle 11g/12c, MySQL, NoSQL, Cassandra, Cosmos DB, DB2.LanguagesPython, Scala, SQL.Version Control ToolsSVM, GitHub, Bitbucket, GitLabHadoop DistributionCloudera, Horton WorksVisualization ToolsPower BI, Tableau.ETL ToolsInformatica, SSIS, SSRS.IDE & Build toolsPyCharm, Visual Studio.EDUCATION:Masters in Computer Science (University of Central Missouri)WORK EXPERIENCE:Evicore Healthcare, Franklin, TN Feb 2023 - PresentAzure Snowflake Data EngineerResponsibilities:Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Snowflake.Leveraged Azure Data Lake Storage as a data lake for storing both processed and unprocessed data, while putting data retention and partitioning strategies into place.Optimized data pipelines and PySpark jobs in Azure Databricks through advanced techniques like Spark performance tuning, data caching, and data partitioning, resulting in superior performance and efficiency.Leveraged Azure Event Hubs for high-volume, low-latency ingestion of POS transactions, inventory updates, and customer interactions.Stored streaming data in Azure Data Lake Storage for decoupling ingestion from processing.Employed compression and encryption strategies to maximize data security and storage expenses and leveraged Azure Blob Storage for effective file storage and retrieval.Integrated Azure Logic Apps for automating business processes and workflow orchestration in data integration and processing tasks.Developed and implemented Azure Functions for tasks related to data pipelines, including validation, enrichment, and preprocessing.Developed triggers within Azure Synapse Analytics to automate data processing tasks, improving efficiency.Integrated Azure services with APIs to Implement data pipelines that consume and produce data via APIs.Managed code repositories using Azure DevOps Git ensuring that code changes are tracked and versioned appropriately.Implemented robust error handling mechanisms for API interactions with Azure.Deployed ADF pipelines in the production environment by monitoring, managing, and optimizing data solutions.Designed and implemented data pipelines using Medallion architecture to ingest, transform, and deliver large volumes of structured and unstructured data.Implemented Delta Lake's schema enforcement and data quality checks to ensure data consistency, integrity, and reliability across diverse datasets.Leveraged Snowflake's Time Travel feature, ensuring optimal data management and regulatory compliance.Implemented a cloud-based data warehouse solution using Snowflake on Azure, leveraging its scalability and performance capabilities.Created and optimized Snowflake schemas, tables, and views to support efficient data storage and retrieval for analytics and reporting purposes.Integrated Snowflake seamlessly with Power BI and Azure Analysis Services to deliver interactive dashboards and reports, empowering business users with self-service analytics capabilities.Implemented partitioning, indexing, and caching strategies in Snowflake to enhance query performance and reduce processing time.Proficient in Snowflake integration, integrated with different data connectors, RestAPIs and Spark.Participated in the development, improvement, and maintenance of Snowflake database applications.Leveraged Python and PySpark to preprocess and transform large-scale datasets, ensuring data quality and consistency before ingestion into Databricks.Developed and Optimized Spark SQL scripts using Scala for faster data processing.Designed and performed Modeling NoSQL database schemas using Cassandra and Cosmos DB based on data requirements.Designed and implemented real-time data processing solutions using Kafka and Spark Streaming, enabling the ingestion, transformation, and analysis of high-volume streaming data.Designed and implemented efficient and normalized database schemas using Dbt.Integrated ADLS Gen2 with Apache Spark and Azure Databricks for scalable data processing and analytics, enabling real-time insights.Utilized GIT for version control, JIRA for project management, and Jenkins for continuous integration and Deployment (CI/CD) processes.Implemented a comprehensive metadata management framework to catalog, standardize, and govern data assets, facilitating efficient data discovery, lineage tracing, and compliance with regulatory requirements.Environment: Azure Databricks, Azure Data Factory, Synapse, Azure service bus, Snowflake,Logic Apps, Functional App, Power BI, Tableau, Delta Lake, MS SQL, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, Pyspark, shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Informatica, IICS, Hbase.PepsiCo, Dallas, Texas Aug 2021 - Jan 2023Azure Data EngineerResponsibilities:Designed and implemented scalable data ingestion pipelines using Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs.Developed robust data processing workflows leveraging Azure Databricks and Spark for distributed data processing and transformation tasks.Ingested data into Databricks Delta tables and implemented efficient data loading strategies, considering factors like partitioning and clustering.Developed real-time data streaming capabilities into Snowflake by seamlessly integrating Azure Event Hubs and Azure Functions, enabling prompt and reliable data ingestion.Leveraged Azure Synapse Analytics to seamlessly integrate big data processing and analytics capabilities, empowering data exploration and insights generation.Implemented Azure Logic Apps to trigger automated processes upon receiving new emails with attachments, efficiently loading the files into Blob storage.Developed and deployed Azure Functions to handle critical data preprocessing, enrichment, and validation tasks within the data pipelines, elevating the overall data quality and reliability.Utilized Azure DevOps Git Repositories to store and manage code for data pipelines and other scripts.Created and managed Azure Data Factory Pipelines (ADF) for orchestrating complex data workflows and ETL processes.Implemented partitioning strategies in Azure to enhance query performance and reduce processing time.Employed snowflake schema modeling techniques to normalize dimension hierarchies and reduce data redundancy, enhancing data integrity and facilitating more efficient data loading and retrieval processes.Created and optimized Snowflake schemas, tables, and views to facilitate efficient data storage and retrieval, catering to advanced analytics and reporting requirements.Enforced advanced techniques such as partitioning, indexing, and caching in Snowflake to enhance query performance and reduce processing time.Utilized Tableau Desktop to perform data exploration, aggregation, and visualization using a variety of charts, graphs, and maps.Enhanced data visualization and decision-making with Power BI across organizational operations.Automated unit tests using Azure DevOps pipelines or similar CI/CD tools to ensure code quality and reliability throughout the development lifecycle.Utilized Azure Event Hub and IoT Hub to ingest and process streaming data from IoT devices, enabling real-time analytics and actionable insights.Used Azure Data Catalog which helps in organizing and to get more value from their existing investments.Demonstrated proficiency in scripting languages like Python and Scala, enabling efficient data manipulation and integration of custom functionalities.Developed and fine-tuned high-performance PySpark jobs to handle complex data transformations, aggregations, and machine learning tasks on large-scale datasets.Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyze retail data.Scalable metadata handling, Streaming and batch unification are offered by Delta Lake.Executed Hive scripts through Hive on Spark and SparkSQL, effectively supporting ETL tasks, maintaining data integrity, and ensuring pipeline stability.Proficiently worked within Agile methodologies, actively participating in daily stand-ups, and coordinated planning sessions.Environment: Databricks, Glue, Azure data grid, Azure Data Factory, Azure Synapse analytics, Azure data catalog, Azure service Bus, Delta Lake, Blob, Power BI, Airflow, Snowflake, cosmos DB, Python, PySpark, Scala, SQL, Kafka, Airflow v1.9.0, Oozie, HBase, Oracle, Teradata, Cassandra, Tableau, Maven, Git, Jira.Client: Verizon, Atlanta, Georgia May 2019 - Jul 2021Role: Big Data EngineerResponsibilities:Developed and optimized ETL workflows using AWS Glue to extract, transform, and load data from diverse sources into Redshift for efficient data processing.Utilized AWS S3 for temporary storage of raw data and checkpointing and AWS Redshift for complex transformations and aggregations.Implemented AWS Athena for ad-hoc data analysis and querying on data stored in AWS S3.Designed and implemented data streaming solutions using AWS Kinesis, enabling real-time data processing and analysis.Implemented robust data pipelines within the AWS tech stack, ensuring efficient data ingestion, transformation, and loading processes from diverse sources into the data lake.Effectively managed DNS configurations and routing using AWS Route53, ensuring efficient deployment of applications and services.Proficient in leveraging AWS Lambda for building scalable and cost-effective data processing pipelines, automating ETL workflows, and executing real-time data transformations.Implemented robust IAM policies and roles to ensure secure user access and permissions for AWS resources.Worked on snowflake to implement udf's and reading data from S3 to snowflake to generate datasets.Developed ETL pipelines using Spark and Hive for performing various business specific transformations.Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table, which specifies the execution times of each job.Developed Oozie workflow for executing Sqoop and Hive actions and worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.Extensively worked on creating combiners, partitioning, and distributed cache to enhance the performance of MapReduce jobs.Demonstrated prowess in leveraging HIVE for intricate data transformations, event joins, and pre-aggregations, optimizing data operations before storage on HDFS.Converted existing MapReduce applications to PySpark application as part of overall effort to streamline legacy jobs and create new framework.Effectively utilized Git and GitHub repositories to maintain source code, enabling effective version control and collaboration.Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, Athena, glue, cloudwatch, kenisis, route53, IAM, Sqoop, MYSQL, HDFS, Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Oozie, PySpark, Python, JIRA, control-m, OOZIE, airflow, Teradata, oracle, SQL, Relational SQL, Oracle.. |