Data Engineer Azure Resume Dallas, TX

Data Engineer Azure Resume Dallas, TX
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Engineer Azure
Target Location	US-TX-Dallas
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Azure data engineer Allen, TX
Azure data engineer Frisco, TX
Sr. Azure Data Engineer Irving, TX
Data Engineer Azure Irving, TX
Data Engineer Azure Frisco, TX
Data Engineer Azure Prosper, TX
Azure Data Engineer Plano, TX
Click here or scroll down to respond to this candidate
Candidate's Name
Azure Data EngineerPhone: PHONE NUMBER AVAILABLE Email: EMAIL AVAILABLE LinkedinPROFESSIONAL SUMMARY:Certified Azure Data Engineer with 11+ years of experience in designing and implementing scalable data ingestion pipelines using Microsoft Azure Cloud, Python, PySpark, Big Data.Hands on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.Proficient in leveraging Azure Databricks and Spark for distributed data processing and transformation tasks.Expertise storing and retrieving unstructured and semi-structured data with Azure Blob Storage.Demonstrated expertise in utilizing Azure Event Hub for real-time streaming data ingestion.Secured and managed sensitive cryptographic keys and secrets by leveraging Azure Key Vault, ensuring robust data protection and compliance in cloud-based environments.Created end-to-end data workflows with serverless solutions, Azure Functions, and Azure Logic Apps.Proficient in utilizing Azure Synapse Pipelines for orchestrating and managing complex data integration and transformation workflows.Develop and maintain ETL workflows using AWS Glue to move data from various source systems to AWS Redshift.Developed data pipelines using AWS services like Lambda, Kinesis Streams and SQS.Designed and created data models using DBT and involved in defining transformations and aggregations to create structured and organized datasets for analysis.Enforced advanced techniques such as partitioning, indexing, and caching in Azure services to enhance query performance and reduce processing time.Hands-On working experience with a diverse range of file formats, including CSV, JSON, Parquet, and Avro, to efficiently store, process, and exchange data within data engineering pipelines and analytics workflows.Proficient in languages like Python and Scala, enabling seamless integration of custom functionalities into data pipelines.Experience in developing Spark applications in python (PySpark) on distributed environment to load large CSV data files with different schema in to Hive ORC tables.Exceptional command over Kafka streaming technology, adeptly utilizing its distributed messaging capabilities to construct resilient and high-performing data flows.Adept at designing cutting-edge, cloud-based data warehousing solutions using Snowflake on Azure, optimizing schemas, tables, and views for streamlined data storage and retrieval.Shown expert-level proficiency in using Snow SQL to retrieve and manipulate large datasets in Snowflake data warehouses.Developed, enhanced and maintained Snowflake database applications, including crafting logical and physical data models, incorporating necessary changes and improvements.Expertly defined roles and privileges to ensure controlled access to various database objects within the Snowflake ecosystem.Automated ETL workflows and scheduled data integration tasks using Informaticas workflow manager and scheduler.Proficient in designing and developing DWH solutions, architecting ETL strategies, and utilizing SQL, PySpark, and SparkSQL for data manipulation and analysis.Developed NoSQL database schemas using Cassandra, and Cosmos DB, aligning with data specifications and requirements.Worked with Hadoop Stack frameworks like Cloudera and Horton works in managing, processing, and analyzing big data.Highly skilled in utilizing Hadoop, HDFS, Map-Reduce, Hive, and SparkSQL and for efficient ETL tasks, batch data processing, and analytics.Expert in optimizing query performance in Hive by designing and implementing Bucketing and Partitioning strategies to enable efficient data retrieval and storage optimization.Leveraged technologies like Apache Airflow and custom-built orchestration frameworks to ensure seamless data movement and synchronization.Demonstrated expertise in implementing advanced serialization techniques like Serde in Hadoop Echo system to optimize data storage, transfer, and deserialization processes.Expertise establishing workflows for Hadoop jobs scheduling and management using Apache Oozie and Control M.Configured and managed Zookeeper to ensure efficient coordination and synchronization of distributed data processing systems.Collaborated seamlessly with data analysts and stakeholders to implement well-aligned data models, structures, and designs.Highly proficient in Agile methodologies, including JIRA for project management and reportingTECHNICAL SKILLS:Azure Cloud ServicesAzure Data Lake, Azure Data Factory, Azure Databricks, Application Insights, Key Vault, AzureBlobStorage, EventHub, Logic Apps, Functional Apps, Snowflake.Big Data TechnologiesHDFS, Yarn, Map Reduce, Hive, SQOOP, Flume, HBase, PySpark, KafkaWeb TechnologiesHTML5, CSS3, XML, JDBC, JSP, RestAPI.DatabasesMy SQL Server, Teradata, Oracle 11g/12c, MySQL, NoSQL, Cassandra, Cosmos DB, DB2.LanguagesPython, Scala, SQL.Version Control ToolsSVM, GitHub, Bitbucket, GitLabHadoop DistributionCloudera, Horton WorksVisualization ToolsPower BI, Tableau.ETL ToolsInformatica, SSIS, SSRS.IDE & Build toolsPyCharm, Visual Studio.EDUCATION:Bachelors in computer science from Saveetha School of Engineering, Chennai, India.WORK EXPERIENCE:Citi Bank July 2022 to PresentSr Azure Data EngineerResponsibilities:Designed and implemented end-to-end data pipelines using Azure Data Factory to facilitate efficient data ingestion, transformation, and loading (ETL) from diverse data sources.Designed and developed complex ETL workflows using Informatica PowerCenter to extract, transform, and load data from various sources into data warehouses and data lakes.Tuned Informatica mappings and workflows for optimal performance, reducing execution time and improving resource utilization.Deployed Azure Data Lake Storage as a reliable and scalable data lake solution, implementing efficient data partitioning and retention strategies to store and manage both raw and processed data effectively.Optimized data pipelines and PySpark jobs in Azure Databricks through advanced techniques like Spark performance tuning, data caching, and data partitioning, resulting in superior performance and efficiency.Leveraged Azure Event Hubs for high-volume, low-latency ingestion of POS transactions, inventory updates, and customer interactions.Employed Azure Databricks Streaming Jobs for real-time data filtering, aggregation, and enrichment in combining POS data with product information.Stored streaming data in Azure Data Lake Storage for decoupling ingestion from processing.Employed Azure Blob Storage for optimized data file storage and retrieval, implementing advanced techniques like compression and encryption to bolster data security and streamline storage costs.Integrated Azure Logic Apps seamlessly into the data workflows, ensuring comprehensive orchestration and triggering of complex data operations based on specific events, enhancing overall data pipeline efficiency.Managed code repositories using Azure DevOps Git ensuring that code changes are tracked and versioned appropriately.Incorporated Azure DevOps practices to enhance the Continuous Integration/Continuous Deployment (CI/CD) pipeline efficiency of Machine Learning (ML) cloud infrastructure.Enforced comprehensive data quality checks using Azure Data Factory, guaranteeing the highest standards of data accuracy and consistency.Integrated Azure services with APIs to Implement data pipelines that consume and produce data via APIs.Implemented robust error handling mechanisms for API interactions with Azure.Integrated Azure IoT Hub with Azure Event Hubs for scalable and reliable event streaming.Employed medallion architecture to ensure design and organize data patterns in Azure Data Lake.Developed custom monitoring and alerting solutions using Azure Monitor providing proactive identification and resolution of performance bottlenecks.Deployed ADF pipelines in the production environment by monitoring, managing, and optimizing data solutions.Worked on Azure synapse to architect and execute advanced analytics, enabling predictive analytics and data-driven insights.Designed and implemented data pipelines to ingest, transform, and load data from various sources into Synapse.Optimized queries and jobs to improve the performance of data processing and analytics tasks within Synapse.Leveraged Snowflake's Time Travel feature, ensuring optimal data management and regulatory compliance.Proficient in Snowflake integration, integrated with different data connectors, RestAPIs and Spark.Optimized PySpark jobs by tuning Spark configurations and utilizing efficient data partitioning strategies, resulting in a 30% reduction in processing time.Integrated Snowflake seamlessly with Power BI and Azure Analysis Services to deliver interactive dashboards and reports, empowering business users with self-service analytics capabilities.Implemented partitioning, indexing, and caching strategies in Snowflake to enhance query performance and reduce processing time.Participated in the development improvement and maintenance of Snowflake database applications.Architected and optimized high-performing Snowflake schemas, tables, and views to accommodate complex analytical queries and reporting requirements, ensuring exceptional scalability and query performance.Utilized Python to create automated data processing programs that increase productivity and accuracy.Configured event-based triggers and scheduling mechanisms to automate data pipelines and workflows.Designed and performed Modeling NoSQL database schemas using Cassandra and Cosmos DB based on data requirements.Implemented data pipelines for data ingestion into NoSQL databases and integrated with other data storage systems and processed frameworks.Designed and implemented efficient and normalized database schemas using Dbt.Integrated data from various sources into Dbt, creating transformation models and SQL-based logic to achieve desired analytics outcomes.Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing retail data.Designed and implemented real-time data processing solutions using Kafka and Spark Streaming, enabling the ingestion, transformation, and analysis of high-volume streaming data.Successfully led and managed end-to-end Azure data engineering projects from conception to production deployment.Developed and Optimized Spark SQL scripts using Scala for faster data processing.Designed and implemented efficient data archiving and retention strategies utilizing Azure Blob Storage.Collaborated closely with cross-functional teams including data scientists, data analysts, and business stakeholders, ensuring alignment with data requirements and delivering scalable and reliable data solutions.Identified opportunities in business processes, system capabilities and delivered methodologies for continuous improvement.Environment: Azure Databricks, Azure Data Factory, Snowflake, Logic Apps, Informatica, Functional App, Cloud SQL, Snowflake, Snowflake's Time Travel, MS SQL, Oracle, Spark, SQL, DBT, Python, Scala, Spark, shall scripting, RestAPIs, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.British American Tobacco Charlotte, NC Sep 2019 to June 2022Azure Data EngineerResponsibilities:Designed and implemented scalable data ingestion pipelines using Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs.Developed robust data processing workflows leveraging Azure Databricks and Spark for distributed data processing and transformation tasks.Implemented data quality checks and transformations using Informatica Data Quality (IDQ) to ensure accuracy, completeness, and consistency of data.Utilized advanced transformation techniques in Informatica, such as lookups, aggregates, joins, and complex expressions, to process and manipulate data.Ensured data quality by implementing validation rules and transformations using Informatica Data Quality.Ensured data quality and integrity through comprehensive data validation, cleansing, and transformation operations performed using Azure Data Factory and Databricks.Ingested data into Databricks Delta tables and implemented efficient data loading strategies, considering factors like partitioning and clustering.Developed real-time data streaming capabilities into Snowflake by seamlessly integrating Azure Event Hubs and Azure Functions, enabling prompt and reliable data ingestion.Leveraged Azure Synapse Analytics to seamlessly integrate big data processing and analytics capabilities, empowering data exploration and insights generation.Automated data pipelines and workflows by configuring event-based triggers and scheduling mechanisms, streamlining data processing and delivery which resulted in 48% reduction in manual intervention.Developed and deployed Azure Functions to handle critical data preprocessing, enrichment, and validation tasks within the data pipelines, elevating the overall data quality and reliability.Utilized Azure DevOps Git Repositories to store and manage code for data pipelines and other scripts.Implemented partitioning strategies in Azure to enhance query performance and reduce processing time.Implemented comprehensive data lineage and metadata management solutions, ensuring end-to-end visibility and governance over data flow and transformations.Identified and resolved performance bottlenecks within data processing and storage layers, optimizing query execution and reducing data latency.Designed and implemented custom PySpark UDFs (User-Defined Functions) to extend the functionality of Spark SQL and DataFrames for complex data processing tasks.Created and maintainied data models, including logical, physical, and dimensional models, to ensure efficient data storage and retrieval.Managed the ingestion of data from various sources into Synapse, ensuring data quality and integrity.Conducted meticulous performance tuning and capacity planning exercises, ensuring scalability and maximizing efficiency within the data infrastructure.Created and maintained data models to optimize query performance and facilitate data analysis.Demonstrated proficiency in scripting languages like Python and Scala, enabling efficient data manipulation and integration of custom functionalities.Developed and fine-tuned high-performance PySpark jobs to handle complex data transformations, aggregations, and machine learning tasks on large-scale datasets.Developed end-to-end data pipelines using Kafka, Spark, and Hive, enabling seamless data ingestion, transformation, and analysis.Leveraged Kafka and Spark Streaming to process and analyze streaming data, contributing to real-time data processing and insights generation, improving real time analytics capabilities by 30%.Utilized Spark core and Spark SQL scripts using Scala to expedite data processing and enhance performance.Architected and implemented a cloud-based data warehousing solution utilizing Snowflake on Azure, harnessing its exceptional scalability and performance capabilities.Enforced advanced techniques such as partitioning, indexing, and caching in Snowflake to enhance query performance and reduce processing time.Created and optimized Snowflake schemas, tables, and views to facilitate efficient data storage and retrieval, catering to advanced analytics and reporting requirements.Collaborated closely with data analysts and business stakeholders to deeply understand their needs and implement well-aligned data models and structures within Snowflake.Executed Hive scripts through Hive on Spark and SparkSQL, effectively supporting ETL tasks, maintaining data integrity, and ensuring pipeline stability.Proficiently worked within Agile methodologies, actively participating in daily stand-ups and coordinated planning sessions.Environment: Azure Databricks, Data Factory, Azure Storage, Key vault, Logic Apps, Informatica, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Rest APIs.WellCare Tampa, FL April 2016 to Aug 2019Big Data EngineerResponsibilities:Demonstrated hands-on experience in Azure Cloud Services, including Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.Created batch and streaming pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipelines to efficiently extract, transform, and load data.Developed Azure Data Factory (ADF) batch pipelines to ingest data from relational sources into Azure Data Lake Storage (ADLS Gen2) in an incremental fashion, applying necessary data cleansing and subsequently loading it into Delta tables.Implemented Azure Logic Apps to trigger automated processes upon receiving new emails with attachments, efficiently loading the files into Blob storage.Implemented CI/CD pipelines using Azure DevOps in the cloud, utilizing GIT, Maven, and Jenkins plugins for seamless code integration and deployment.Built a Spark Streaming application for real-time analytics on streaming data, leveraging Spark SQL to query and aggregate data in real-time and visualize the results in Power BI or Azure Data Studio.Developed Spark Streaming applications that integrate with event-driven architectures such as Azure Functions or Azure Logic Apps, processing events in real-time and triggering downstream workflows based on the results.Designed and implemented data pipelines using Lambda functions to ingest streaming data from various sources.Utilized AWS S3 for temporary storage of raw data and checkpointing and AWS Redshift for complex transformations and aggregations.Utilized AWS SQS for decouple data ingestion from processing for scalability and reliability.Defined and enforced data access controls based on user roles and permissions RBAC.Built ML models with cross functional teams using AWS SageMaker to predict target variables using real-time and historical data.Created Hive tables, loaded and analyzed data using Hive queries, and developed custom Hive UDFs to extend Hive's functionality.Integrated data from various sources such as databases, APIs, logs, and files by writing Java code to extract, transform, and load (ETL) data into the desired format.Migrated ETL processes from Oracle to Hive, testing and validating the ease of data manipulation and processing in Hive.Implemented solutions to reprocess failure messages in Kafka using offset IDs, ensuring data integrity and reliability.Tuning queries to improve performance within PostgreSQL, minimizing unnecessary data retrieval and I/O operations.Understood the strengths and limitations of NoSQL databases like Cosmos DB, and structuring data efficiently for querying.Debugging data pipeline issues, optimizing performance within PostgreSQL and across the data ecosystem, and scaling infrastructure as needed.Developed a PySpark job in Scala to index data into Azure Functions from external Hive tables stored in HDFS.Utilized HiveQL to analyze partitioned and bucketed data, optimizing query performance and efficiency.Developed and executed Hive queries on analyzed data for aggregation and reporting purposes.Developed Sqoop jobs to efficiently load data from RDBMS into external systems like HDFS and Hive.Developed Spark applications using PySpark and Spark SQL for data extraction, transformation, and aggregation from multiple file formats, ensuring efficient data processing.Implemented Spark scripts using Scala and Spark SQL to access Hive tables for faster data processing.Loaded data from UNIX file systems to HDFS, ensuring data availability for further processing and analysis.Configured Spark Streaming to receive real-time data from Apache Flume and stored the stream data in Azure Tables using Scala.Utilized Spark RDD transformations to filter data and enhance SparkSQL processing capabilities.Utilized Hive Context and SQL Context to integrate Hive metastore and SparkSQL for optimized performance and data processing.Utilized GIT as the version control system to access repositories and coordinate with CI tools for effective code management and collaboration.Environment: Azure Data Factory, Azure Synapse Analytics, Azure DevOps, AWS S3, AWS Redshift, AWS Glue, AWS SageMaker, Sqoop, HDFS, Power BI, Git, Zookeeper, Flume, Kafka, Apache PySpark, SparkSQL, Scala, Hive, Hadoop, Cloudera, HBase, HiveQL, MySQLChevron Corporation Santa Rosa, NM Oct 2013 to Mar 2016Big Data EngineerResponsibilities:Developed and maintained an ETL framework utilizing Sqoop, and Hive to ingest and process data from diverse sources, ensuring availability for consumption.Processed HDFS data, created external tables using Hive, and developed reusable scripts for table ingestion.Developed robust ETL jobs using Spark and Scala to migrate data from Oracle to new MySQL tables, ensuring smooth data transfer and maintaining data integrity.Utilized Spark (RDDs, Data Frames, Spark SQL) and Spark-Cassandra Connector APIs for tasks, including data migration and real-time sales analytics.Developed a Spark Streaming application for real-time sales analytics, contributing to improved decision-making processes.Analyzed source data, handled data type modifications, and generated Power BI ad-hoc reports using Excel sheets, flat files, and CSV files.Leveraged Sqoop to efficiently extract data from multiple data sources into HDFS, facilitating seamless data integration.Orchestrated data imports from various sources, executed transformations using Hive and MapReduce, and loaded processed data into HDFS.Developed robust data ingestion pipelines using Spring Boot, incorporating features such as Spring Data, Spring Integration, and Spring Cloud Stream to efficiently ingest data from diverse sources including databases, APIs, files, and streaming platforms.Proficient in processing and transforming raw data into analyzable formats using Spring Batch for batch processing tasks and Spring Integration or Spring Cloud Stream for real-time data processing.Designed, implemented, and maintained data pipelines using Java frameworks like Apache Beam, Apache Kafka, or Spring Batch to efficiently process and move data across systems.Ensured data quality and integrity by implementing data validation checks, error handling mechanisms, and data governance policies within Java code.Identified bottlenecks and optimizied Java code and queries for performance, scalability, and reliability, especially when dealing with large volumes of data.Developed data classification algorithms using MapReduce design patterns, enhancing data processing efficiency and accuracy.Analyzed SQL scripts and devised optimal solutions using PySpark, ensuring efficient data processing and transformation.Employed advanced techniques including combiners, partitioning, and distributed cache to optimize the performance of MapReduce jobs.Worked with a variety of big data technologies including Apache Hive, HBase, Apache Spark, Zookeeper, Flume, and Sqoop.Environment: Hadoop, Hive, HBase, Zookeeper, spark, PySpark, Sqoop, Spark SQL, Shallscript, Cassandra, ETL, oozie, Flume.Apollo Hospitals Hyderabad, India Nov 2012 to Sep 2013Hadoop DeveloperResponsibilities:Designed and implemented Hadoop-based data warehouses for Apollo Health Care, enabling efficient storage and retrieval of patient records and medical data.Actively participated in Agile Scrum Methodology, engaging in daily stand-up meetings. Proficiently utilized Visual SourceSafe for Visual Studio 2010 for version control and effectively managed project progress using Trello.Implemented advanced reporting functionalities in Power BI, including Drill-through and Drill-down reports with interactive Drop-down menus, data sorting capabilities, and subtotals for enhanced data analysis.Wrote triggers, stored procedures, and functions using Transact-SQL (T-SQL) and maintained physical database structures.Deployed scripts in different environments based on Configuration Management and Playbook requirements.Created and managed files and file groups, establishing table/index associations, and optimizing query and performance tuning.Implemented cost-effective storage solutions within the Hadoop ecosystem, optimizing data storage and retrieval costs for Apollo Health Care.Established robust data backup and recovery mechanisms within the Hadoop infrastructure, ensuring the availability and reliability of healthcare data for Apollo Health Care.Performance tuning experience with spark /MapReduce or SQL jobs.Successfully scaled Hadoop clusters at Apollo Health Care to handle increasing volumes of healthcare data, ensuring optimal performance and responsiveness.Implemented performance tuning strategies to enhance data processing and analytics capabilities.Experience using Source Code and Version Control systems like SVN, Git, Bit Bucket etc.Maintained users, roles, and permissions within the SQL Server environment.Streamlined the deployment of SSIS Packages and optimized their execution through the creation of efficient job configurations.Automated report generation and Cube refresh processes by creating SSIS jobs, ensuring the timely and accurate delivery of critical information.Excelled in deploying SSIS Packages to production, leveraging various configuration options to export package properties and achieve environment independence.Utilized SQL Server Reporting Services (SSRS) to author, manage, and deliver comprehensive reports, both in print and interactive web-based formats.Developed robust stored procedures and triggers to enforce data consistency and integrity during data entry operations.Collaborated with Cross-functional teams to understand their requirements and provided effective technical solutions.Environment: MS SQL Server 2008/2012, Visual Studio 2010, Apache Spark, Apache Hadoop, Hive, SQL, Shell Scripting, MS Office, MS Access, Git, GitHub.
Respond to this candidate
Your Message
Please type the code shown in the image: