| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
New Jersey, USA | PHONE NUMBER AVAILABLE | EMAIL AVAILABLE| LinkedIn
SUMMARY
Data Engineer with 5 years of experience in designing and maintaining scalable ETL pipelines using Apache Spark, Python, and SQL.
Proficient in AWS services, including Glue, Redshift, SageMaker, and DMS, as well as Azure Data Factory. Skilled in data integration,
predictive maintenance, and real-time data streaming with Apache Kafka and Amazon Kinesis. Demonstrated success in implementing
data quality and governance frameworks with Apache Atlas and AWS Glue Data Catalog. Adept at deploying machine learning models,
creating custom dashboards with Tableau, and automating pipeline monitoring with Prometheus and Grafana.
EDUCATION
New Jersey Institute of Technology, NJ, USA Dec 2023
Master of Science in Computer Science
Anna University, Chennai, India Apr 2020
Bachelor of Engineering (B.Tech.), Computer Science and Engineering
PROFESSIONAL EXPERIENCE
Humana, USA | Data Engineer May 2023 Present
Designed and maintained scalable ETL pipelines using Apache Spark, Python, and SQL. Conducted performance tuning for SQL, NoSQL
databases, and Google BigQuery, cutting query response times by 30%.
Implemented workflows with AWS Glue, Amazon Redshift, and Azure Synapse, optimizing data transformation and reducing
processing time by 40%.
Designed and implemented Snowflake database schemas and tables, optimizing for performance and scalability.
Utilized Snowflake s cloud data platform to manage and analyze large datasets, ensuring efficient data storage and retrieval.
Integrated Snowflake with Python/PySpark for advanced data processing and transformation tasks.
Use Docker for containerization and Kubernetes for orchestration to streamline deployment and management of data applications.
Tuned Snowflake queries and optimized performance using best practices for data warehousing and SQL execution.
Developed and deployed AI-enhanced data quality solutions, incorporating anomaly detection algorithms and natural language
processing (NLP) techniques to identify and resolve data inconsistencies. Utilized TensorFlow and PyTorch for model training, which
improved data accuracy by 30% and streamlined data governance processes.
Work with various stakeholders, including data, design, product, and executive teams, to assist with data-related technical issues.
Develop and maintain data pipelines using Python, leveraging Apache Airflow for workflow scheduling and orchestration, and
integrating Apache Flink for stream processing.
Ensured GDPR and CCPA compliance by designing and implementing data pipelines with robust data anonymization, encryption, and
audit mechanisms, enabling secure handling of sensitive data while fulfilling data subject access requests (DSARs) in a timely manner.
Spearheaded the development and deployment of multiple machine learning and deep learning models across various business units,
optimizing predictive analytics workflows, and reducing data processing time by 30%.
Oversaw the installation, configuration, and maintenance of data center hardware, including servers, storage systems, and
networking equipment, ensuring optimal performance and uptime.
Supervised the deployment, configuration, and maintenance of servers, storage systems, and networking hardware, ensuring high
availability, security, and optimal environmental conditions. Implemented best practices for power management, cooling, and space
optimization to enhance data center efficiency and reliability.
Implement and optimize query APIs using JSON, XML, and other formats, ensuring efficient data retrieval and integration across
systems. Employ Unix-based command-line interfaces and Bash scripts to automate tasks, manage system operations, and streamline
data workflows.
Applied in-depth knowledge of linear algebra, calculus, and probability theory to optimize model performance, achieving a 10%
increase in model accuracy for predictive analytics solutions.
Manage and optimize NoSQL databases, with a focus on Firestore and MongoDB, ensuring efficient data storage and retrieval.
Manage and orchestrate complex workflows using Kubernetes and Airflow, ensuring high availability and fault tolerance in data
processing pipelines.
provisioning, monitoring, and optimizing cost and performance in cloud environments. Spearheaded the development of a unified
customer data platform using AWS Redshift, Spark, and Snowpark, consolidating data from multiple sources and enabling advanced
customer analytics and personalized marketing strategies.
Led the integration of disparate healthcare data sources using FHIR standards, AWS Glue, and Apache Storm, improving data
interoperability and patient care coordination. Implement progressive deployment models and emerging technologies to enhance
data engineering practices and capabilities.
Deployed machine learning models using AWS Sage Maker and Apache Airflow, boosting predictive analytics accuracy by 25%.
Sigma InfoSolutions, India | Data Engineer Nov 2019 - Aug 2022
Designed and implemented scalable big data architectures tailored to business needs, balancing performance requirements with
considerations for data volume, velocity, and variety.
Utilized tools like Elasticsearch and NiFi to enhance data processing and search capabilities.
Applied in-depth Extract, Transform, Load (ETL) skills to orchestrate the smooth movement of data, optimizing processes for
efficiency and maintaining data integrity throughout. Leveraged Apache Oozie for workflow scheduling and orchestration.
Conducted data migration and optimization projects, improving data processing efficiency by 35% through Snowflake enhancements.
Applied encryption and access controls to secure sensitive data in Hadoop clusters, Azure Databricks, and GCP BigQuery. Utilized GCP
(IAM) and Azure (IAM) for identity and access management, and incorporated Azure Event Hub for real-time data ingestion and
processing. Utilized Docker and Kubernetes for scalable deployment of data engineering solutions.
Conducted performance tuning for ETL processes, resulting in a resource utilization optimization of 20%. Utilized GCP BigQuery,
Dataflow, and Dataproc for optimized data processing.
Work closely with data scientists to understand data requirements and provide support for data extraction, transformation, and
analysis. Design scalable data architectures that accommodate growing data volumes and evolving business needs, ensuring future-
proof solutions.
Design and maintain data models for efficient data management and integration, ensuring alignment with business needs.
Implemented monitoring solutions, resulting in a 40% reduction in data quality issues through proactive identification and
resolution. Used Azure Monitor and GCP Cloud Monitoring for monitoring and alerting.
Utilized Snowflake for data warehousing, implementing optimized storage and retrieval solutions to support scalable analytics and
reporting. Integrated Elasticsearch for enhanced search functionalities and data retrieval.
Designed and optimized data warehousing solutions on Snowflake, implementing scalable data models and query performance
enhancements that resulted in a 30% reduction in query response times and improved data accessibility for business intelligence.
Led the migration of on-premises data centers to cloud-based Snowflake environments, executing data extraction, transformation,
and loading (ETL) processes with Azure Data Factory and AWS Glue. Successfully transitioned over 50TB of data with zero downtime,
enhancing data scalability and reliability.
Architected and optimized scalable data pipelines using Apache Spark and Snowflake, integrating real-time data streams from Kafka
to enable near-instantaneous analytics. Improved data processing efficiency by 35% and reduced latency in real-time reporting.
Led the migration of on-premises data infrastructures to cloud-based environments, leveraging Snowflake and AWS Glue. Executed
data transformation and loading processes that facilitated the seamless transition of over 100TB of data, enhancing system scalability
and performance.
Developed and maintained scalable ETL pipelines using Azure Data Factory, Azure Databricks, and Azure Logic Apps to support
business analytics and reporting needs.
Utilized SQL, MongoDB, and Cosmos DB to extract, retrieve, and analyze data, bridging the gap between technical and business users
for data-driven decision-making.
Implemented robust data security and compliance measures, including data encryption, access control, and data masking, to
safeguard sensitive information within Azure cloud environments.
Employed machine learning libraries and frameworks such as TensorFlow, PyTorch, and LangChain to build and optimize models,
resulting in a 30% reduction in training time and increased model reliability.
Implemented robust CI/CD pipelines using Azure DevOps and GitLab, automating the build, test, and deployment processes for data
engineering solutions. Enhanced workflow automation, reducing deployment times by 50% and ensuring high-quality, error-free
releases.
Developed and maintained advanced data governance frameworks, incorporating automated metadata management and data lineage
tracking using tools like Collibra. Strengthened data integrity and compliance, achieving a 25% reduction in data discrepancies and
improving audit readiness.
Utilize AWS services including S3 and Iceberg for scalable data storage and management, ensuring seamless integration with cloud-
based architectures. Built and deployed web applications using Django and Flask frameworks, integrating with relational databases
and external APIs to deliver dynamic and user-friendly solutions.
Designed and managed cloud-based data solutions on AWS and Azure, utilizing services such as AWS S3, Lambda, and Azure Data
Factory. Optimized resource provisioning through Infrastructure as Code (IaC) with Terraform, leading to a 40% reduction in
deployment time and improved resource management.
Engineered end-to-end ETL solutions with Databricks, incorporating machine learning models for predictive analytics and anomaly
detection. Enhanced data accuracy and processing efficiency by 30%, enabling more insightful and timely business decisions.
Built and optimized data lakes using AWS S3 and Snowflake, integrating diverse data sources and implementing data partitioning
strategies. Achieved a 40% increase in query performance and reduced storage costs through efficient data management practices.
Utilized advanced data quality techniques, including AI-driven anomaly detection and data cleansing algorithms, to maintain high
data accuracy and consistency. Improved data quality by 30%, boosting stakeholder confidence in analytical outcomes.
TECHNICAL SKILLS
Methodologies: SDLC, Agile/ Scrum, Waterfall
Language: Python, SQL, R, SCALA, Java
Python Packages: Pandas, NumPy, Matplotlib, SciPy, Scikit-Learn, SeaBorn, PyTorch. TensorFlow, ggplot2, Plotly, Keras
Data Analytics Skills: Data Manipulation, Data Cleaning, Data Visualization, Exploratory Data Analysis, Data Analysis
Data Components: HDFS, Hue, MapReduce, PIG, Hive, Hudi, HCatalog, HBase, Sqoop, DBT, Impala, Zookeeper, Flume, Yarn, Cloudera
Manager, Kerberos, Pyspark, Airflow, Kafka, MongoDB, Snowflake(Snowflake Console ,Presto, SQL Worksheet, Snowpipe, Snowpark),
Informatica, Talend, T-SQL and PL/SQL
Technical Skills: AWS, AZURE(Databricks), GCP, Docker and Kubernetes, Smartsheet, NLP, A/B Testing, Hypothesis testing, ETL,
Databricks, Hadoop, Spark, Snowflake, Big Query, Microsoft Fabric, Apache Airflow
Core: Data Engineering, Data Pipeline, ETL (Extract, Transform, Load), Data Integration, Data Warehousing, Big Data, Data Modeling, Data
Migration, Data Cleansing, Data Transformation, Data Aggregation, Data Architecture, Real-Time Data Processing, Batch Processing, Data
Ingestion, Data Center, Data Governance, Data Quality, Data Catalog, Metadata Management, Distributed Systems, Data Partitioning,
Columnar Storage, Cloud Data Platforms, Data Staging, Stream Processing, OLTP/OLAP, Data Privacy & Security
Others: Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving
Databases: MySQL, MS SQL Server, Oracle.
Tools: Tableau, Power BI, Advanced Excel, MS Office Suite (Excel, Visio, Access, Word, PowerPoint), Visual Studio, GIT, Jupyter Notebook
Version Control: Git, GitHub
Operating Systems: Windows, macOS
CERTIFICATION
Microsoft Certified: [DP-203] Azure Data Engineer Associate.
IBM Developer Skill Networks Certified: Python for Data Science.
Udemy Certified: Data Engineering Essentials using SQL, Python and PySpark.
|