| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
+1 Street Address 676 9722 |EMAIL AVAILABLE | West Haven, CT, USA | LINKEDIN LINK AVAILABLE | https://github.com/
SUMMARY
Senior Data Engineer with extensive experience in building infrastructure for machine learning and collaborative data science
projects. Successfully developed data pipelines and ETL workflows using Apache PySpark and AWS, facilitating real-time data
processing and advanced machine learning models. Proficient in cloud-based solutions such as AWS and Azure, ensuring high
data quality and reliability. Ready to leverage extensive data engineering skills to drive on-site interactions using cutting-edge
algorithms
CERTIFICATIONS
Microsoft Certified - Azure Data Engineer Associate
Link to Certification
PROFESSIONAL EXPERIENCE
Bosch Global Software Technologies Coimbatore, India
Senior Data Engineer June2021 - December2022
Designed and implemented efficient, scalable data pipelines using AWS services such as S3, EMR, Glue, Lambda, Step
Functions and Apache PySpark for automated and reliable processing of large datasets
Expertly developed and maintained ETL workflows using Apache Airflow to extract, transform, and load data from diverse
sources into AWS data lakes and warehouses, ensuring seamless data flow and integration across environments
Collaborated with data scientists and analysts to design and implement optimized data models, providing robust solutions
that enabled advanced analytics, machine learning model deployment, and data-driven decision-making for stakeholders
Optimized data storage and retrieval by designing efficient schemas and leveraging AWS Redshift for data warehousing,
while ensuring data quality and reliability through data validation checks and monitoring with AWS CloudWatch and
custom logging solutions
Conducted performance tuning and optimization of PySpark jobs, reducing processing times and resource consumption, and
migrated legacy ETL processes to modern cloud-based solutions, increasing efficiency and scalability
Created interactive dashboards and reports using Power BI to visualize key performance metrics and trends, facilitating
data-driven decision-making for stakeholders.
Integrated Digital Info Services Chennai, India
Data Analyst May 2017 - May 2021
Expertise in building and optimizing ELT/ETL pipelines using SQL, Python, Azure Data Factory, and Apache
Airflow,Azure Synapse Analytics for batch and streaming data, specializing in scalable data pipelines for healthcare data
integration with Azure and Databricks
Designed and maintained b to collect, process, and store large volumes of data from various sources, enhancing overall database
and data warehouse performance and compliance with data governance protocols
Implemented Extract, Transform, Load (ETL) processes to ensure data quality across multiple platforms.
Enhanced data processing capabilities and scalability by implementing Azure Synapse Analytics.
Utilized Azure SQL Database and Azure Data Lake Storage for data warehousing and scalable storage solutions,
optimizing performance and ensuring high availability
Implemented robust data integration solutions, complex SQL queries, and efficient data validation and error handling
mechanisms to ensure high data quality and reliability
SKILLS
Cloud Computing: AWS (S3 Buckets, EC2 servers, Lambda, SES, Kafka, Athena Glue), AZURE (Data Factory, Data Storage, Azure
Synapse Analytics, DataBricks, Machine Learning Studio), GCP Big Data Technologies: Apache PySpark, Apache Hadoop, Apache
Kafka, Spark, Flink Programming Languages: Python, Go, C/C++, .Net, Scala, Java Database: SQL, MongoDB,ChromaDB,Pinecone,
NoSQL, Oracle Cloud Database Data Integration: PowerQuery, ETL Processes, SQL, APIs,ExcelDataVisualization:PowerBIDesktop,
Power BI Service, Power BI Mobile, Tableau Data Modeling: DAX, Data Modeling, PowerQuery, Data Integration, Window Functions
ETL Tools: Informatica PowerCenter, SQL Server SSIS, Snowpipe, Apache Airflow, ETL Pipeline Design, Data Migration, Database
Management MLOps Tools: MLflow, GitHub Action, CI/CD Artificial Intelligence: Machine learning, NLP Data Handling: Data
Storage Patterns, Data Lakes, Data Pipelines, Data Processing Frameworks, Open Source Frameworks, JSON Data Data Security: Data
Security Data Engineering: Data Warehousing, Distributed Systems Monitoring and Logging: AWS CloudWatch, ELK Stack
PROJECTS & OUTSIDE EXPERIENCE
1. Olympic Data Analytics
Developed an Olympic Data Analytics solution leveraging Azure services such as Azure Data Factory, Azure Data
Storage AzureDatabricks,AzureSynapseAnalytics, and Power BI,Orchestrated end-to-end data workflows,from
extraction to visualization, enabling stakeholders to derive actionable insights from Olympic data efficiently
Link to project
2. YouTube Data Analysis
Designed and implemented a YouTube data analysis pipeline leveraging AWS services. Utilized AWS S3 for data storage,
AWS Glue and Lambda for data processing, and AWS Athena for querying and analysis. Enabled data visualization and
insights through integration with QuickSight and Power BI
Link to project
EDUCATION
University of New Haven January 2023 - May 2024
Master's, Data Science
|