Azure Data Engineer Resume Fairfax, VA

Azure Data Engineer Resume Fairfax, VA
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Azure Data Engineer
Target Location	US-VA-Fairfax
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Azure Data Engineer Germantown, MD
Data Engineer Software Odenton, MD
data engineer Reston, VA
Data Engineering Manager Washington, DC
Data Engineer Machine Learning Halethorpe, MD
Data Engineer Big Herndon, VA
Click here or scroll down to respond to this candidate
NAME: Candidate's Name
Phone: PHONE NUMBER AVAILABLE Data EngineerLinkedIn: Candidate's Name
Email: EMAIL AVAILABLEPROFESSIONAL SUMMARYExperienced Data Engineer with 5 years in Information Technology, specializing in Azure cloud services and Big Data technologies like Spark and Hive. Proficient in ETL development and deploying Hadoop applications on cloud platforms like Azure and AWS. Skilled in optimizing query performance, developing Spark scripts, and implementing EDW solutions. Strong expertise in Azure services, data modeling, and CI/CD practices, with a track record of successfully leading projects from planning to deployment in Agile environments.TECHNICAL SKILLSBig Data TechnologiesHDFS, MapReduce, Tez, Hive2, YARN, Airflow, Oozie, Sqoop, HBase, Ranger, DAS, Atlas, Ranger KMS, Druid, Spark2, Hive LLAP, KNOX, SAM, NiFi, NiFi Registry, KafkaHadoop DistributionCloudera, Horton WorksAzure ServicesAzure Blob Storage, Azure Data Lake Analytics, Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Cosmos DB, AKS, Azure SQL Database, Azure Event Hubs, Azure VMs, Azure Storage, Azure Active Directory, Azure Kubernetes Service, Azure HDInsight, Azure Monitor, Azure Purview, Azure Cosmos DB, Azure Key Vault, RBAC.LanguagesJava, SQL, PL/SQL, Python, HiveQL, Scala, Node.js, TypeScriptWeb TechnologiesHTML, CSS, JavaScript, XML, JSP, Restful, SOAPOperating SystemsWindows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.Build Automation/Atlassian toolsAnt, Maven, JIRA, ConfluenceVersion ControlGIT, GitHub, GitFlow, BitBucketIDE &Build Tools, DesignEclipse, Visual Studio.DatabasesMS SQL Server 2016/2014/2012, SSIS, SSRS, SSMS, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB, PostgreSQLWORK EXPERIENCEClient: Food Lion, United States May 2022 Feb 2024Role: Data EngineerResponsibilities:Orchestrated all phases of data engineering processes including requirements gathering, architecture design, implementation, and testing within AWS ecosystem.Led all phases of SDLC including requirements analysis, application design, code development & testing.Wrote SQL queries involving DDL, DML, indexes, triggers, views, stored procedures, functions, and packages for effective database management.Implemented data partitioning and bucketing techniques in Hadoop to optimize query performance and reduce processing overhead. Designed and implemented fault-tolerant data replication strategies in Hadoop clusters, ensuring data durability and high availability.Integrated Apache HBase with Hadoop ecosystem for real-time data storage and retrieval, enabling low-latency access to massive datasets. Implemented data governance policies and access controls using Apache Ranger, ensuring compliance with regulatory requirements.Conducted capacity planning and performance tuning for Hadoop clusters, optimizing resource allocation and cluster utilization. Designed and implemented disaster recovery solutions for Hadoop infrastructure, leveraging tools like Apache Falcon and DistCp.Migrated legacy data warehouse systems to Hadoop-based platforms, leveraging tools like Apache Sqoop for seamless data transfer. Developed and optimized ETL pipelines using SQL and Apache Spark, handling complex data transformations, and ensuring seamless data flow from diverse sources to target databases.Implemented scalable solutions for processing large datasets with Python and Hadoop, incorporating advanced ETL techniques to cleanse and validate data integrity throughout the pipeline.Designed and optimized SSIS packages for incremental data loading and CDC (Change Data Capture) mechanisms, maintaining data consistency and minimizing processing time for daily batch updates.Designed and implemented ETL processes using SSIS to cleanse, transform, and load data into Azure Data Lake Storage, ensuring data consistency and integrity across different systems.Collaborated with cross-functional teams to design and deploy robust data architectures, utilizing tools like Airflow and Talend to orchestrate intricate workflows and streamline data movement.Enhanced data warehouse performance by fine-tuning SQL queries and indexing strategies, reducing query execution time and improving overall system efficiency.Integrated third-party tools and technologies within ETL pipelines, such as Apache Airflow and Talend, to enhance data orchestration capabilities and automate routine tasks efficiently.Proficient in writing complex SQL queries to extract, manipulate, and manage data in relational databases such as Oracle, MySQL, and SQL Server.Implemented dynamic DAG generation in Airflow using Scala-based templating engines like Jinja2, enabling flexible and scalable workflow definitions. Leveraged Scala libraries like Algebird and Cats Effect to implement advanced data processing and manipulation tasks within Airflow workflows.Integrated Airflow with distributed scheduling frameworks like Apache Mesos and Kubernetes using Scala-based operators, ensuring efficient resource utilization and scalability.Implemented custom Airflow Executors in Scala to support parallel task execution and resource isolation in large-scale data processing pipelines. Developed monitoring and alerting systems for Airflow workflows in Scala, utilizing tools like Prometheus and Grafana for performance metrics visualization.Utilized Scala-based DSLs like Apache Beam's Scio to express data transformation pipelines declaratively within Airflow DAGs, enhancing code readability and maintainability.Optimized PySpark jobs by fine-tuning Spark configurations and leveraging partitioning and caching techniques, leading to significant improvements in processing speed and resource utilization.Designed and implemented real-time streaming data pipelines using PySpark and Apache Kafka, handling high-volume data streams and performing near real-time analytics for actionable insights.Utilized Python's pandas library to perform complex data transformations and manipulations on large datasets retrieved from different sources, ensuring data consistency and accuracy throughout the ETL process.Integrated Python-based data visualization libraries like Matplotlib and Plotly into data pipelines to generate interactive visualizations and dashboards, to create actionable insights and facilitate data-driven decision-making.Developed Spark core and Spark SQL scripts using Scala for faster data processing. Working with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation.Implemented efficient CI/CD pipelines using Azure DevOps, leveraging Git for version control and Jenkins for automated builds, ensuring seamless integration and deployment of data solutions.Orchestrated Agile development processes within the team, utilizing Jira and Confluence for sprint planning, backlog management, and collaborative documentation, fostering a cohesive and productive work environment.Collaborated with DevOps teams to implement advanced CI/CD capabilities such as progressive delivery or chaos engineering for Big Data applications, improving resilience and reliability in production environments.Performed code reviews and utilized GitFlow for branching and collaboration.Involved in Agile project methodology with daily and weekly releases and used JIRA and Confluence for timely tracking of tasks progress and documentation of processes.Experience in full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning.Environment: Azure Databricks, Data Factory, Logic Apps, Log Analytics, Function Apps, Azure ML, Azure Cosmos DB, Snowflake, MS SQL, Oracle, HDFS, MapReduce, Azure Devops, YARN, Spark, Hive, SQL, Python, Pandas, NumPy Scala, Pyspark, shell scripting, GIT, JIRA, Jenkins, kafka, ADF Pipeline, Power Bi, Azure, Azure AD Connect, Active Sync, Confidential, Exchange 2013, Office 365, PowerShell cmdlets, Power Query.Client: Schlumberger, India Jun 2018 - Aug 2021Role: Data EngineerResponsibilities:Involved in complete Data flow of the application starting from data ingestion upstream to HDFS, processing the data in HDFS and analyzing the data and involved.Managed end-to-end Azure Big Data flow, encompassing data ingestion from various sources into Azure Blob Storage, processing the data using Azure Data Lake Analytics, and analyzing the data with Azure Databricks.Configured Azure Data Factory to extract data from web server output files and load it into Azure Blob Storage.Created external tables with partitions using Azure Synapse Analytics (formerly SQL Data Warehouse), Azure Data Lake Storage, and Azure Cosmos DB.Designed and implemented Azure environment utilizing services such as Azure Blob Storage, Azure VMs, Azure Functions, Azure Data Factory, Azure Data Lake Storage, Azure Databricks, and Azure SQL Database.Integrated Azure Data Lake Storage Gen2 with Databricks Delta Lake for efficient storage and management of big data sets, optimizing data retrieval and processing speeds for AI/ML workloads.Implemented Azure Data Explorer for real-time analytics on streaming data ingested into Databricks, enabling near-instantaneous insights and decision-making based on the latest data.Collaborated with data architects to design and implement a hybrid data integration solution using Azure Data Factory and SSIS, orchestrating seamless data movement between on-premises SQL Server databases, Azure Blob Storage, and Azure SQL Data Warehouse.Leveraged Azure Machine Learning Pipelines to automate model training, tuning, and deployment workflows within Databricks, increasing operational efficiency and reducing time-to-market for AI solutions.Designed and implemented custom data preprocessing pipelines in Databricks using Apache Spark, integrating with Azure ML for feature engineering and model training, enhancing predictive modeling accuracy.Collaborated with data scientists to deploy TensorFlow and PyTorch models on Azure Machine Learning Compute within Databricks, harnessing distributed computing capabilities for scalable deep learning inference.Implemented Azure Data Share to securely share curated datasets from Databricks with external partners and stakeholders, ensuring data privacy and compliance while fostering collaboration.Implemented real-time data streaming pipelines using Azure Event Hubs and Azure Data Factory to ingest and process IoT sensor data, integrating with SSIS for data cleansing and enrichment before storing in Azure Cosmos DB, supporting predictive maintenance for a manufacturing client.Orchestrated data ingestion pipelines using Azure Event Hubs and Apache Kafka within Databricks, enabling real-time data processing and analysis for AI-driven insights and decision-making.Developed and deployed custom machine learning scoring pipelines in Azure Functions, integrating with Databricks for model inference, enabling low-latency predictions in production environments.Implemented Azure Data Catalog to discover and govern data assets within Databricks, facilitating collaboration and knowledge sharing among data engineers, data scientists, and business users.Leveraged Azure Monitor and Azure Log Analytics for monitoring and troubleshooting Databricks clusters and jobs, ensuring high availability and performance of AI/ML workloads in production environments.Utilized Azure Data Factory with Databricks integration to orchestrate complex ETL processes, transforming and enriching data for downstream analytics and reporting. Developed custom data connectors in Power BI to connect to various data sources, including Azure SQL Database, Synapse Analytics, and third-party APIs.Leveraged Azure Machine Learning for predictive analytics tasks, building and deploying machine learning models to derive actionable insights from data. Implemented Azure Purview for data governance and compliance, ensuring data lineage, classification, and access control across Azure data services.Conducted capacity planning and performance optimization for Azure Synapse workloads, fine-tuning resource allocation and query optimization for optimal performance.Automated data validation and quality checks using Azure Data Factory, implementing custom data quality rules and alerts to ensure data integrity. Implemented data security and compliance measures within ADF pipelines using PySpark encryption and access control techniques, ensuring adherence to regulatory requirements and safeguarding sensitive information.Provisioned and optimized Azure Databricks clusters for high concurrency and performance to accelerate data preparation tasks. Developed and maintained Snowflake data models, including schema design, table creation, and optimization for query performance, leveraging Snowflake's automatic scaling and clustering features.Utilized Azure Data Catalog to maintain metadata and facilitate data discovery and governance, enabling seamless querying of refined data from Azure Synapse Analytics and Azure Data Lake Storage.Utilized Azure HDInsight for processing data stored in various formats like Avro, Parquet, JSON, and CSV, applying transformations and aggregations as required.Developed and maintained data pipelines using Azure Data Factory, orchestrating data movement and transformation activities across Azure services.Implemented CI/CD pipelines using tools like Azure DevOps, Jenkins, or GitLab CI/CD to automate the building, testing, and deployment of containerized applications to AKS clusters.Adapted DevOps practices within the organization, championing the use of Azure DevOps for infrastructure as code (IaC) and configuration management, resulting in increased efficiency and reliability of data solutions.Utilized Azure Monitor and Azure Log Analytics to monitor and troubleshoot data pipelines and applications, ensuring optimal performance and reliability in a CI/CD environment.Conducted regular training sessions and knowledge sharing activities on Agile methodologies and DevOps practices, empowering team members to leverage tools like Azure DevOps, Git, Jenkins, Jira, and Confluence effectively in their day-to-day work.Integrated AKS with container registries such as Azure Container Registry (ACR) or Docker Hub to store and manage container images, enabling seamless image deployments to Kubernetes clusters.Orchestrated multi-stage deployment strategies (e.g., blue-green deployments, canary releases) using AKS deployment strategies and Kubernetes features like Helm charts or customize to minimize downtime and risk during application updates.Employed Azure Functions for serverless computing, automating data processing tasks and integrating with other Azure services seamlessly. Implemented data governance solutions using Azure Purview, ensuring data lineage, classification, and compliance with regulatory requirements.Designed and optimized data models in Azure Cosmos DB for efficient querying and scalability, ensuring high performance for real-time applications. Leveraged Azure Key Vault for securely storing and managing cryptographic keys, secrets, and certificates used in data encryption and authentication.Environment: Hadoop (HDFS, Map Reduce), Azure Blob Storage, Log Analytics, Azure Data Lake Analytics, Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Devops, Snowflake, Azure Cosmos DB, AKS, SQL Database, Azure Active Directory, Azure Kubernetes Service, Azure HDInsight, Python, PySpark, SQL, PostgreSQL, Flink, Jenkins, NiFi, Scala, MongoDB, Cassandra, Python, Sqoop, Hibernate, spring, Oozie, Auto scaling, Scala, UNIX Shell Scripting.EDUCATIONMasters: University of Dayton Computer SciencesBachelors: Jawaharlal Nehru Technological University Hyderabad Computer Science and Engineering
Respond to this candidate
Your Message
Please type the code shown in the image: