Data Engineer Resume Reston, VA

Data Engineer Resume Reston, VA
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	data engineer
Target Location	US-VA-Reston
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Data Engineer Cloud Arlington, VA
Senior Data Engineer CLARKSBURG, MD
Data Engineer Solutions Germantown, MD
Data Engineer Software Odenton, MD
Data Engineering Manager Washington, DC
Data Engineer Program Manager Washington, DC
Click here or scroll down to respond to this candidate
Candidate's Name
Sr. Data Engineer/ ETL DeveloperEmail: EMAIL AVAILABLE Ph No: PHONE NUMBER AVAILABLELinkedIn: LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY:Around 9+ Years of IT experience, with excellent knowledge in Data Engineer/ ETL Developer. Experience in Analysis, Architecture, and Development.Extensive experience in designing and deploying scalable data solutions on Azure, utilizing Azure Data Factory, Azure Databricks, and Azure Synapse Analytics.Proficient in creating and managing data pipelines in Azure Data Factory for ETL processes, ensuring efficient data flow and integration.Proficient in designing and implementing data architectures on AWS, utilizing services like AWS Glue, Amazon Redshift, and Amazon S3 for data warehousing and ETL.Experienced in developing and orchestrating complex data pipelines using AWS Glue and AWS Lambda, ensuring seamless data integration and transformation.Good experience with Google Cloud Platform. (Big Query, Data proc, Composer, Cloud run)Skilled in developing robust ETL processes using tools like Informatica, Talend, and SSIS for seamless data integration.Expertise in developing Spark code using Python and Spark-SQL, Streaming, and Pyspark for faster testing and processing of data.Skilled in using SparkSQL for querying structured data within Spark applications.Experience in writing complex Scala Spark transformations and actions for data processing workflows.Extensively worked on Hadoop, Hive, and Spark to build ETL and Data Processing systems having various data sources, data targets and data formatsProficient in handling CSV, Parquet, and Delta file formatsProficient in monitoring Control-M jobs and promptly resolving job failures to ensure data pipeline reliability.Extracted the data from MySQL, and AWS RedShift into HDFS using Sqoop.Integrated Snowflake with various data sources and tools, automating data workflows for streamlined operations.Extensive experience developing and deploying machine learning models to optimize business processes and drive data-driven decision-making.Experienced in using Business intelligence& Visualization tools (POWER BI).Designed ETL workflows on Tableau and deployed data from various sources to HDFS.Strong ability to collaborate with business stakeholders to understand requirements, design intuitive user interfaces, and deliver actionable insights through QlikView solutions.Skilled in leveraging GitHub Actions for automating CI/CD pipelines, and enhancing deployment efficiency.Proficient in setting up and managing CI/CD pipelines using GitLab CI to automate testing and deployment processes.Experienced in integrating Azure DevOps with various data engineering tools and services for streamlined project workflows.Experience using JIRA, Maven, Jenkins and GIT for Version controlling and error reporting.Have good interpersonal, and communicational skills, strong problem-solving skills, and Strong analytical and judgment techniques.TECHNICAL SKILLS:Programming LanguagesPython, Scala, SQL, PL/SQL, R, DAXETL ToolsAzure Data Factory, AWS Glue, Informatica, Talend, SSIS, Apache Flume, Sqoop, AirflowBig Data FrameworksHadoop, Spark, Hive, HDFS, MapReduce, YARNData WarehousingSnowflake, Redshift, AWS S3, Azure SQL Database, Azure Cosmos DBData ProcessingSpark, PySpark, SparkSQL, Scala, Spark Streaming, HadoopCloud PlatformsAzure, AWSMachine LearningAzure Data bricks, Mallis, Python, R, SASBusiness Intelligence & VisualizationPower BI, Tableau, QlikView, Sisense,Version Control & CI/CDGit, GitHub, GitLab, Jenkins, GitHub Actions, Azure DevOpsDatabase ManagementMySQL, PostgreSQL, SQL Server, Oracle, Aurora, ExadataData Analysis & Visualization ToolsMatplotlib, Seaborn, Jupyter Notebooks, DAX, Power PivotDevOps ToolsDocker, Kubernetes, Jenkins, GitLab CI, Azure DevOpsData ModelingERD, Power Designer, ErwinPROFESSIONAL EXPERIENCE:Client: California Department of Public Health, Sacramento, CA. Jul 2021 PresentRole: Senior ETL DeveloperResponsibilities:Developed data pipelines on Azure Data Factory to orchestrate ETL processes across Azure SQL Database and Azure Synapse Analytics.Utilized Azure Blob Storage to store and manage structured and unstructured data at scale.Designed and orchestrated data pipelines in Azure Data Factory to move and transform data across various Azure services and on-premises sources.Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.Used Azure DevOps & Jenkins pipelines to build and deploy different resources(Code and Infrastructure)in Azure.Created complex data flows utilizing Azure Data factory (ADF s) mapping data flows to perform data transformations, including joins, aggregations, and lookups.Used Azure Data Factory monitoring features to track pipeline performance, analyze activity runs, and optimize workflows for better efficiency.Transitioned from traditional ETL to ELT processes using Azure Databricks and Snowflake, leveraging their scalability and processing power to directly load raw data and perform transformations within the target databaseImplemented security best practices for Azure Event Hubs, including Shared Access Signatures (SAS) and Azure Active Directory authentication to secure data transmission.Conducted performance tuning and scaling strategies for Event Hubs to accommodate varying workloads and optimize latency in data processing.Integrated Azure Cosmos DB with other Azure services such as Azure Functions, Azure Logic Apps, and Azure Data Factory for seamless data processing and integration workflows.Created, and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.Utilized Python libraries (e.g., Pandas, NumPy) in Azure Synapse Notebooks to enhance data analysis and manipulation capabilities, complementing KQL queries for comprehensive data insights.Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.Used Python NumPy, SciPy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 Spark, Mallis) to develop a variety of models and algorithms for analytic purposes.Designed and optimized SparkSQL queries for aggregating and analyzing large datasets.Optimized Scala Spark jobs for data transformation, cleansing, and enrichment.Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline systemUsed Spark and Spark-SQL to read the parquet data and create the tables in the hive using the Scala API.Developed a Preprocessing job using Spark Data frames to flatten JSON documents to flat files.Managed and optimized Hadoop clusters, ensuring efficient resource allocation and minimizing job execution times.Monitored and troubleshot Control M job schedules to maintain system performance and minimize downtime.Configured and optimized Talend components and transformations to enhance job performance, including tuning data flow and leveraging parallel processing capabilities.Integrated Talend with various databases and applications, including relational databases, file systems, and web services, to support diverse data integration and transformation needs.Implemented and optimized Snowflake data warehouse solutions for healthcare clients, leveraging Snowflake's scalable architecture and performance features to handle large volumes of clinical and administrative data, and improving data access and query performance by up to 40%.Created custom Snowflake data models for advanced healthcare analytics, including predictive modeling and cohort analysis.Implemented granular access control policies in Snowflake, leveraging role-based access management to safeguard sensitive healthcare data.Led the migration of legacy data warehouses to Snowflake, modernizing infrastructure to enhance data processing and scalability.Expertly mapped complex relationships between entities in large-scale databases, utilizing advanced features of Erwin to visualize and communicate intricate data structures to stakeholders.Leveraged advanced data modeling methodologies such as dimensional modeling, data vault modeling, and canonical data models to address complex business requirementsDeveloped a comprehensive metadata management framework, leveraging Erwin s metadata repository to ensure consistent documentation, enhance data lineage trackingManaged and optimized MS SQL Server databases, ensuring high availability and performance.Developed complex T-SQL queries, stored procedures, and functions.Managed relational databases using MySQL, PostgreSQL, and Aurora, ensuring high availability and performance.Implemented job scheduling and monitoring through DataStage Director, ensuring timely execution and prompt resolution of job failures, which improved operational reliability.Optimized MySQL database performance through indexing and query optimization techniques.Designed and implemented advanced indexing strategies (e.g., composite, partial, and column store indexes) resulting in a 30% improvement in query performanceSpearheaded the partitioning of large tables using range and hash partitioning techniques, which reduced query response times by up to 50%Authored and executed comprehensive unit test scripts for ETL processes, ensuring functionality and data integrity. Developed test cases to validate data transformations and adherence to business requirements.Designed parameterized stored procedures to facilitate dynamic data retrieval, resulting in improved security and reduced risk of SQL injection attacks.Designed and implemented PowerShell-based monitoring solutions for system performance and resource usage, providing real-time alerts and automated responses to potential issues,Used various sources to pull data into Power BI such as Oracle, and SQL Server.Using a query editor in Power BI performed certain operations like fetching data from different file.Experienced with GitHub, GitLab and CI/CD pipelines.Strong experience with DevOps essential tools like Docker, Kubernetes, GIT, and Jenkins.Created and maintained Jira dashboards for team visibility on project progress, backlog, and sprint goals.Environment: Azure Data Factory, Azure Synapse Analytics, Azure Blob Storage, Azure Databricks, Informatica, Python, Spark, SparkSQL, Scala, Hadoop, Control M, MS SQL Server, T-SQL, MySQL, PostgreSQL, Aurora, Power BI, GitHub, GitLab, CI/CD pipelines, Docker, Kubernetes, Jenkins, Jira.Client: Mastercard, Purchase, NY. Dec 2018 Jun 2021Role: ETL DeveloperResponsibilities:Managed Azure SQL Database and Azure Cosmos DB configurations to optimize performance and scalability for big data solutions.Developed and maintained Azure Databricks clusters for data transformation and advanced analytics workflowsDeveloped serverless applications using Azure Functions triggered by events from Event Hubs to process and analyze incoming data in real-time.Utilized ADF Mapping Data Flows and Data Flow activities to perform data transformations, including data cleansing, aggregation, and enrichment, to support business intelligence and reporting needs.Implemented scheduling and orchestration of data workflows using Azure Data Factory triggers and pipelines, ensuring timely and reliable data processing.Set up monitoring and alerting for data pipelines and activities using Azure Data Factory s monitoring features, proactively identifying and addressing data processing issues to maintain pipeline reliability.Designed and implemented ETL processes to extract, transform, and load data from various sources into data warehouses.Identified and tested for bugs and performance bottlenecks within ETL solutions by conducting thorough analysis and monitoring.Implemented Azure Databricks for real-time data processing and machine learning model training.Created, and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.Developed advanced query optimization techniques as a Databricks SME, leveraging Spark s adaptive query execution to significantly reduce query latency and improve interactive analytics performance.Designed and optimized Spark jobs on Databricks, resulting in a 40% reduction in processing time for large-scale data transformations and analytics.Developed automated ETL workflows in Databricks, streamlining data integration from multiple sources and improving data availability for reporting and analysisImplemented end-to-end ETL processes to extract, transform, and load data from heterogeneous sources into data warehouses.Optimized Talend jobs for performance tuning and scalability, ensuring efficient processing of large volumes of data.Automated data pipelines using Python frameworks such as Airflow to ensure timely data delivery.Conducted exploratory data analysis (EDA) and visualization using Python's Matplotlib and seaborn.Modified selected machine learning models with real-time data in Spark (PySpark).Utilized Spark SQL for querying structured data, improving query performance and enabling real-time analytics.Led the design and implementation of star schemas for multiple data warehousing projects, incorporating fact and dimension tables that facilitated fast query performanceUtilized advanced data modeling techniques, such as bridge tables and junk dimensions, to address complex business requirements and optimize data retrieval strategies in star schema configurations.Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data and responsible for managing data from different sources.Developed Spark Programs using Scala and Java APIs and performed transformations and actions on RDDs.Worked with architect to improve cloud Hadoop architecture as needed for Research.Monitored and maintained Airflow DAGs to ensure reliable and accurate execution of scheduled workflows.Developed data analytics solutions using Amazon Redshift Spectrum, enabling seamless querying of data across S3 and Redshift.Implemented and monitored data integration jobs using Informatica's scheduling and monitoring tools, ensuring timely execution and resolution of any issues or failures.Utilized Talend Open Studio for data integration and transformation tasks, including developing reusable components and leveraging built-in connectors for diverse data sources.Managed Snowflake cost optimization through resource monitoring and forecasting, ensuring efficient compute and storage utilization.Integrated Snowflake with PowerBI for advanced healthcare data visualizations and reporting.Provided training and mentorship on Snowflake best practices, including data modeling and performance optimization.Developed an SQL script for creating Power BI reports & Dashboards to identify the trends in data in the form of visualizations and reports to the teams.Automated repetitive data modeling tasks using Erwin's scripting capabilities and API integrations, significantly reducing development time and enhancing productivity for the data engineering team.Conducted in-depth performance tuning for data models, utilizing Erwin s profiling tools to identify bottlenecks and optimize data access pathsConducted comprehensive index maintenance, including regular rebuilds and reorganizations, to sustain optimal performance and minimize fragmentation.Successfully managed multiple data migration projects, transitioning data from on-premises databases (e.g., SQL Server, Oracle) to cloud platforms.Designed RESTful APIs to facilitate real-time data exchange, improving integration between systems and enhancing overall data accessibility.Integrated Qlik with various data sources, including SQL databases, Excel files, and REST APIs, to ensure a seamless flow of data into the reporting environment.Created high-level and interactive dashboards using Power BI, facilitating data-driven decision-making for senior stakeholders.Managed version control for collaborative data engineering projects using Git and GitHub.Collaborated with cross-functional teams to track and prioritize data pipeline development tasks and issues in Jira.Environment: Azure SQL Database Management, ADF, Azure Cosmos DB Configuration, Azure Databricks Cluster Management, Python, Spark, Spark SQL, Amazon Redshift Spectrum, Snowflake, SQL, star Schema, Erwin,Data models, Tableau, Git, GitHub, Jira.Client: TATA AIG General Insurance Company Limited, Mumbai, India. Mar 2016 Oct 2018Role: Data EngineerResponsibilities:Implemented scalable data storage solutions using AWS S3 for efficient data ingestion and archival.Designed and deployed AWS EC2 instances for processing and analysing large datasets with tools such as Spark and Hadoop.Managed AWS Lambda functions to automate data processing pipelines and reduce operational overhead.Implemented AWS IAM roles and policies to ensure secure access and compliance with data governance standards.Contributed to the design and implementation of data lakes using AWS S3 and Azure Data Lake Storage, supporting both ETL and ELT strategies to enhance data availability for data scientists and analysts.Designed and developed ETL jobs using AWS Glue to extract, transform, and load data into AWS data lakes and data warehouses.Designed and implemented ETL processes using Informatica PowerCenter to extract, transform, and load data from various sources into the data warehouse.Experienced in Punt, the Python unit test framework, for all Python applicationsCapable of writing functional and object-oriented Scala code to implement complex data processing logic and algorithms, ensuring code readability and maintainability.Implemented data partitioning and caching strategies in Spark, improving data processing efficiency for large-scale datasets.Designed and implemented large-scale data processing pipelines using Hadoop ecosystem tools such as HDFS, MapReduce, and YARN.Implemented advanced scripting techniques in Qlik to manipulate and transform data during the loading process, ensuring data integrity and accuracy.Implemented logging and exception handling frameworks in PL/SQL, ensuring reliable data processing and easy troubleshooting.Leveraged Erwin Data Modeler to create and maintain detailed conceptual, logical, and physical data models, ensuring data architecture met organizational standards and business requirements.Designed and optimized complex MySQL queries to improve data retrieval efficiency.Developed and published reports and dashboards using POWER BI and written effective DAX formulas and expressions.Experienced in the development and utilization of APIs for seamless data sharing and consumption across different platforms.Managed version control and code collaboration using Git for multiple data engineering projects.Environment: AWS S3, AWS EC2, Spark, Hadoop, AWS Lambda, AWS IAM, ETL Processes, Talend, AWS Glue, ETL, Informatica, Scala, Spark, Hadoop, PL/SQL, MySQL, POWER BI, Git.Client: Yashoda Hospitals, Hyderabad, India. Jul 2015 Feb 2016Role: Data AnalystResponsibilities:Developed and maintained ETL pipelines on AWS, leveraging services such as Glue, Lambda.Implemented data storage solutions using S3, ensuring secure and scalable data management.Designed and executed data migration strategies to AWS, ensuring minimal downtime and data integrity.Designed and implemented real-time data migration solutions using Apache Kafka and AWS Lambda to support dynamic data integration needs.Developed ETL workflows using Sqoop to import/export data between Hadoop and relational databases.Utilized R for statistical analysis and data visualization to support business decision-making.Created interactive data exploration and visualization using Jupyter Notebooks.Conducted data analysis and predictive modeling using SAS.Designed and developed business intelligence reports using Sisense.Utilized advanced Excel functions (VLOOKUP, pivot tables, macros) for data manipulation and analysis.Created insightful dashboards and reports using Tableau for data-driven decision-making.Utilized Cognos for reporting and performance management.Analyzed large datasets stored in Oracle and Exadata environments.Implemented data integration and transformation processes using Informatica.Developed BI reports and OLAP cubes to provide multidimensional data analysis.Created and managed data models and reports using Power Pivot.Built interactive visualizations and dashboards using QlikView.Automated data processing and analysis tasks using Python scripting.Conducted GAP analysis to identify discrepancies and improvements in data processes.Environment: AWS, Hadoop, Sqoop, R, Jupyter Notebooks, SAS, Sisense, Excel, Tableau, Oracle, Exadata, Informatica, OLAP Cubes, Power Pivot, QlikView, Python, GAP.EDUCATION: Jawaharlal Nehru Technology University, Hyderabad, TS, IndiaBTech in Computer Science and Engineering, June 2011 - May 2015
Respond to this candidate
Your Message
Please type the code shown in the image: