Azure Data Engineer Resume Frisco, TX

Azure Data Engineer Resume Frisco, TX
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Azure data engineer
Target Location	US-TX-Frisco
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Azure data engineer Allen, TX

Sr. Azure Data Engineer Irving, TX

Data Engineer Azure Irving, TX

Data Engineer Azure Frisco, TX

Data Engineer Azure Prosper, TX

Data Engineer Azure Dallas, TX

Azure Data Engineer Plano, TX

Click here or scroll down to respond to this candidate

Candidate's Name
Phone: Street Address -331-5124 EMAIL AVAILABLE https://LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY: Senior Data Engineer with 10 years of IT experience in Data warehousing, Big Data, Spark, Data bricks, Azure, Data Modelling, Data Virtualization, and current focus is to enhance my skillset in Cloud & Big Data Analytics
Proficient in working with Hadoop and Spark ecosystems in both cloud-based (Azure) and on-premises environments. Skilled in designing data warehouses and data lakes, data modeling for modern and traditional databases, and determining system performance standards. Involved in developing short-term to long-term technology roadmaps for product development. Proven ability to establish modern data platform architecture and cloud migration plans. Experienced in writing programs in Python, Shell scripting, and SQL scripts. Well-versed in configuring, administering, tuning, and architecting data pipelines using reporting tools. Strong understanding of IAAS and PAAS offerings in Azure, including Hub and Spoke Topology, Service End Points, Vent and Subnet, NSG, UDR, and network traffic routing. Knowledgeable and experienced in big data tools like Hadoop, Azure Data Lake, and AWS Redshift. Experienced in performing real-time analytics on NoSQL distributed databases like Cassandra, HBase and MongoDB. Skilled in designing attractive data visualization dashboards using Tableau. Developed Scala scripts, UDFs using Data frames and RDDs in Spark for data aggregation, queries, and writing data back into OLTP Systems. Created batch data using Spark with Scala API in developing Data Ingestion pipelines using Kafka. Utilized Flume and Kafka to direct data from various sources to/from HDFS. Scripted an ETL Pipeline in Python that ingests files from AWS S3 to Redshift Table. Experienced in working with various file formats like ORC, Avro, Parquet, and JSON. Knowledgeable about using Data Bricks Platform, Cloudera Manager, and Hortonworks Distribution to monitor and manage clusters. Expertise in working with Linux/Unix and shell commands on the Terminal. Experienced in developing MapReduce jobs in J2EE/Java for data cleansing, transformations, pre-processing, and analysis. Skilled in developing OLAP Cubes using SQL Server Analysis Services (SSAS). Experienced in collecting Log Data and JSON data into HDFS using Flume and processing data using Hive/Pig. Developed Spark Streaming jobs using RDDs and Spark SQL as required. Strong knowledge of Hadoop ecosystems, including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper, etc. Experienced in installing, configuring, and administering Big Data platforms like Cloudera Manager and MCS of MapR. Well-versed in data modeling in relational database management systems (RDBMS) and object-based Data Lakes. Experienced in working with Hortonworks and Cloudera environments. Knowledgeable in implementing data processing techniques using Apache HBase. Excellent experience in installing and running various Oozie workflows and automating parallel job executions.TECHNICAL SKILLSProgramming LanguagesPython, Spark, PysparkDatabasesSQL, Hive, Pig, SQL Server, MySQL, MS-SQL, MS Access, HDFS, HBase, Teradata, MongoDB, Cassandra, KQL, Azure SQLDatabase Design Tools and Data ModellingAzure Data Factory, Azure Data Lake, Azure BLOB, Star Schema/Snowflake Schema modelling, AWS data pipeline, AWS data lake, S3, HDFS, Fact & Dimensions tables, physical & logical data modelling, Normalization and De-normalization techniquesData WarehousingInformatica PowerCenter, Powermart, Data quality, Bigdata, Pentaho, ETL Development, Amazon Redshift, IDQTools and TechniquesPython (Jupyter notebook, PyCharm), Microsoft Office Suite, Azure (Microsoft Remote Desktop, CLI), Microsoft SQL Server, HDFS, Hadoop MapReduce, Docker, Azure Databricks, Tableau, JIRA, ETL Data stage 8.1, Power BI, SVM, GitHub, BitbucketOperating SystemsWindows, Linux, UNIXEDUCATION: Bachelor s degree in Computer Science Engineering. May 2009 Apr 2013Jawaharlal Nehru Technology University HyderabadWORK EXPERIENCE:Client: Optum - Eden Prairie, MN Oct 2022 - PresentRole: Azure Data EngineerResponsibilities: Designed and implemented data ingestion patterns for standardizing disparate data sources into a unified enterprise-level benchmarking and comparison framework. Executed ETL operations in Azure Data Factory, leveraging JDBC connectors to connect to various relational database source systems. Deployed Azure IaaS virtual machines and Cloud services into secure virtual networks and subnets, ensuring robust security and scalability. Implemented the Lakehouse Architecture in Azure data lake Gen2 storage, combining the benefits of data warehousing and data lakes. Developed standardized data ingestion patterns for disparate sources, enabling enterprise-level benchmarking and comparison capabilities. Demonstrated expertise in data source analysis, data standards implementation, data quality maintenance, and master data management. Built Databricks notebooks to extract data from diverse sources like DB2 and Teradata, performing data cleansing, wrangling, ETL processing, and loading into Azure SQL DB. Orchestrated data pipelines using Azure Data Factory and developed a custom alerts platform for real-time monitoring. Created Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats, uncovering insights into customer usage patterns. Automated file validations in Databricks using Python scripts and ADF, ensuring data quality and integrity. Optimized SQL scripts using PySpark SQL for enhanced performance and efficiency. Worked with various data formats like JSON, Parquet, and Delta, reading and writing data from diverse sources using PySpark. Developed Spark applications in Python to load large CSV files with different schemas into PySpark Data frames, processing and reloading data into Azure SQL DB tables. Analyzed data in its native environment by mounting Azure Data Lake and Blob storage. Crafted complex SQL queries using stored procedures, common table expressions, and temporary tables to support Power BI reports. Collaborated with the enterprise Data Modeling team on logical model creation. Utilized Microsoft Azure to provide data movement and scheduling functionality for cloud-based technologies like Azure Blob Storage and Azure SQL Database. Independently managed ETL process development, from design to delivery. Built ETL data pipelines using Azure Data Factory to ingress data from Blob storage to Azure Data Lake Gen2. Designed and developed user interfaces, customizing reports with Tableau and designing cubes for data visualization and presentation. Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes, and Normalization (3NF) and De-normalization. Created mappings and sessions based on business requirements and rules to load data from source flat files and RDBMS tables to target tables. Developed custom alerts queries in Log Analytics, leveraging Web hook actions for automation.Environment: MS SQL, Python libraries, PL/SQL, MDM, SQL Server, DB2, Azure, Data Factory, Azure Data lake, Azure SQL, Azure BLOB, Azure Databricks, Power BI.Client: Lululemon - Dallas, TX May 2021 Sept 2022Role: Senior Data EngineerResponsibilities: Developed large-scale data processing pipelines to handle petabytes of transaction data and egress to analytical sources. Extracted, parsed, cleaned, and ingested incoming web feed data and server logs into HDInsight and Azure Data Lake Store, managing both structured and unstructured data. Created scripts to extract and process POS sales data from an SFTP server into a Hive data warehouse using Spark and Python. Implemented Spark best practices to efficiently process data, utilizing features like partitioning, resource tuning, memory management, and checkpointing to meet ETAs. Extensively used Azure services, including ADLS for data storage and ADF pipelines for triggering resource-intensive jobs. Used Azure Synapse to manage processing workloads and serve data for BI and prediction needs. Developed Big Data solutions focused on pattern matching and predictive modeling. Created Hive external tables for staging data and moving data from staging to main tables. Implemented Big Data solutions using Hadoop, Hive, and Informatica to pull/load data into the HDFS system. Pulled data from Data Lake and processed it with various RDD transformations. Developed Scala scripts and UDFs using both DataFrames/SQL and RDD/MapReduce in Spark for data aggregation, queries, and writing data back into RDBMS through Sqoop. Designed custom process transformations via Azure Data Factory and automation pipelines, extensively using Azure services like ADF and Logic App for ETL processes, pushing data between databases and Blob storage, HDInsight HDFS, and Hive tables. Built PySpark pipelines for validating tables in Oracle and Hive. Designed Python APIs to connect Azure Data Lake services for storing and retrieving mobile data. Developed integration application automation to handle failover scenarios without manual triggers. Created Spark and Hive jobs to summarize and transform parquet and JSON data. Implemented passive audit checks after ingesting data into external Hive tables by matching source file counts with Hive table counts. Configured active audit frameworks before ingesting files into HDFS, enabling filename checks, record count checks, file size checks, duplicate checks, missing file checks, and zero-byte checks.Environment: Azure Services, Databricks, Cloudera, Pyspark, Hadoop, Hive, Streamsets, Terraform, JIRA, Jenkins.Client: Ford Motor Company - Memphis, TN Sept 2019 April 2021Role: Big Data EngineerResponsibilities: Utilized Azure SQL Database Import and Export Service for efficient data management. Worked extensively with Azure Cloud Services, including Azure Synapse Analytics, SQL Azure, Data Factory, Azure Databricks, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake. Developed solutions using the Cloudera distribution system. Created tabular models on Azure Analysis Services to meet business reporting requirements. Developed Python, PySpark, and Bash scripts to transform and load data across on-premise and cloud platforms. Leveraged Apache Spark, including Spark SQL and Streaming components, to support intraday and real-time data processing. Managed data ingestion to various Azure services, such as Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, and performed cloud migration using Azure Databricks. Created pipelines, data flows, and complex data transformations using Azure Data Factory and PySpark with Databricks. Gained a deep understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation, and aggregation techniques, including tools like DBT and DataStage. Worked with Azure BLOB and Data Lake storage, and loaded data into Azure SQL Synapse Analytics (DW). Implemented Snowpipe for continuous data load from staged data on cloud gateway servers. Developed Spark code using Scala and Spark-SQL/Streaming for faster data processing. Performed Hive test queries on local sample files and HDFS files. Used Spark Streaming to process streaming data in batches for batch processing in the Spark engine. Developed transformation logic using Snowpipe and worked hands-on with Snowflake utilities, SnowSQL, and Snowpipe, applying Big Data model techniques using Python and Java. Created ETL pipelines for data warehouses using Python and Snowflake's SnowSQL, and wrote SQL queries against Snowflake. Analyzed Hadoop clusters and utilized Big Data analytic tools like Pig, Hive, HBase, Spark, and Sqoop. Developed Spark applications using Scala and implemented Apache Spark data processing projects to handle data from various RDBMS and streaming sources. Developed ETL processes using Spark, Scala, Hive, and HBase, and set up clusters and jobs for Azure Databricks. Utilized visualization tools like Power View for Excel and Tableau for visualizing and generating reports. Worked with NoSQL databases such as HBase and MongoDB, and applied data cleansing techniques using Informatica. Analyzed data profiling results and performed various data transformations. Wrote Python scripts to parse JSON documents and load data into databases. Generated weekly and bi-weekly reports using Business Objects and documented them for the client business team.Environment: Hadoop, Azure, Azure Data Lake, Azure Databricks, Snowflake, Scala, ETL, Hive, Python, Maven, MySQL, Spark, Informatica Tool, IDQ Informatica Developer Tool HF3Client: Broadridge Financials Irving, TX May 2018 Aug 2019Role: Data Engineer
Responsibilities: Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of business rules. Understand business use cases, integration business, write business & technical requirements documents, logic diagrams, process flow charts, and other application related documents. Well versed in working with disparate source systems like Oracle, SQL Server, Teradata, HDFS, web API, files, etc. Built Data ingestion patterns based on source systems, refresh frequency required and type of data using Sqoop, ADF, Databricks and Spark Streaming to ingest data to Storages. Creating Hive UDF (Using Cipher, AES) to decrypt data key using KMS Master keys and then decrypt the source encrypted messages on the fly in Spark Structured streaming for further transformations like flattening JSON and extracting specific fields from JSON as per the Use case. Implemented performance optimizations on databases, tuning of SQl queries using Partitioning, Bucketing and indexing techniques. Worked on administration activities such as installations, Access Control and configuration of clusters using Apache Hortonworks, Azure HDInsight, and Azure Data brick Planned and executed knowledge transfer and training sessions with business users to ensure they take full advantage of the solution as per their needs. Extensively used Python libraries for Data analysis, and Used efficient methods for handling null values, missing values, outliers. Involved in full lifecycle of projects, including requirement gathering, system designing, application development, enhancement, deployment, maintenance, and support Involved in logical modeling, physical database design, data sourcing and data transformation, data loading, SQL and performance tuning. Project development estimations to business and upon agreement with business delivered project accordingly Developed mappings in Informatica to load the data from various sources into the Data Warehouse, using different transformations like Source Qualifier, Expression, Lookup, aggregate, Update Strategy, and Joiner Used various transformations like Source qualifier, Aggregators, lookups, Filters, Sequence generators, Routers, Update Strategy, Expression, Sorter, Normalizer, Stored Procedure, Union etc. Used Informatica Power Exchange to handle the change data capture (CDC) data from the source and load into Data Mart by following slowly changing dimensions (SCD) type II process. Used Power Center Workflow Manager to create workflows, sessions, and used various tasks like command, event wait, event raise, email. Designed, created and tuned physical database objects (tables, views, indexes, PPI, UPI, NUPI, and USI) to support normalized and dimensional models.Environment: SQL, Tableau, Python, ETL pipelining, SSIS, SSRS, SQL Server, VBA, Excel (pivot tables, index, lookups, formulas), Data quality, mapping, Database design & architecture, Profiling, Mining, star & snowflake modelling, batch data processing, Azure, Data Factory, Azure Data lake, Azure SQL, Azure BLOB, Azure DatabricksXurmo Technologies - Bangalore, India Oct 2015 Dec 2017Role: Hadoop DeveloperResponsibilities
Developing ETL's using Informatica Cloud Services (ICS) and third-party data connectors (i.e. Salesforce, Zuora, Oracle EBS etc.) and Change data capture
Export/Import data from Teradata to Hive/HDFS using Sqoop and the Hortonworks Connector for Teradata.
Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
Kafka producer API and consumer API configuration, upgrading, rolling upgrade, Topic level Configs, Kafka connect configs, stream configs, consumer rebalancing, operations, replication, message delivery semantics, end - to - end batch compression etc.
Worked on Informatica power center and power exchange in integrating with different applications and relational databases.
AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda.
Worked with different API Endpoints like Edge Optimized, Regional, Private in AWS API Gateway.
Configured control connections different levels like API Key, Method level, Account Level.
Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files. AWS API Gateway protection strategies like Resource Policies, IAM, Lambda, Cognito Authentications.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Building ETL data pipeline on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs. Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub. Created continuous integration and continuous delivery (CI/CD) pipeline on AWS that helps to automate steps in software delivery process.
Environment: AWS, ETL, Informatica power center, CI/CD, Hadoop, Teradata, Pig, Hive, UDFs, Kubernetes, Docker, GitLumbini Elite Solutions Bangalore, India May 2013 Sept 2015Role: ETL DeveloperResponsibilities: Interacted with business community and gathered requirements based on changing needs.
Wrote SQL Queries, Triggers, Procedures, Macros, Packages and Shell Scripts to apply and maintain the Business Rules.
Performed error handling using error tables and log files.
Used Informatica Designer to create complex mappings using different transformations like Filter, Router, Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Expressions and Aggregator transformations to pipeline data to Data Warehouse.
Performed DML and DDL operation with the help of SQL transformation in Informatica.
Collaborated with Informatica Admin in process of Informatica upgrading PowerCenter version. Used SQL Transformation to sequential loads in Informatica Power Center for ETL processes.
Developed mappings in Informatica to load the data from various sources into the Data Warehouse, using different transformations like Source Qualifier, Expression, Lookup, aggregate, Update Strategy, and Joiner. Worked on Informatica Advanced concepts & also Implementation of Informatica Push down Optimization technology and pipeline partitioning. Used various transformations like Source qualifier, Aggregators, lookups, Filters, Sequence generators, Routers, Update Strategy, Expression, Sorter, Normalizer, Stored Procedure, Union etc. Used Informatica Power Exchange to handle the change data capture (CDC) data from the source and load into Data Mart by following slowly changing dimensions (SCD) type II process. Used Power Center Workflow Manager to create workflows, sessions, and used various tasks like command, event wait, event raise, email. Created a cleanup process for removing all the Intermediate temp files that were used prior to the loading process. Worked closely with the business analyst's team to solve the Problem Tickets, Service Requests. Helped the 24/7 Production Support team.
Environment: Oracle, SQL, PL/SQL, SQL*PLUS, HP-UX, Informatica Power Center, DB2 Cognos Report net, Windows

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: