| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
EMAIL AVAILABLE | PHONE NUMBER AVAILABLE | Arlington, VA Street Address , USA
Linkedin | Github | Portfolio
PROFILE SUMMARY
Data Engineering Professional with 3+ years of experience driving data solutions across various platforms, leveraging
expertise in Big Data Engineering, Spark, PySpark, SQL, Python, and cloud technologies (AWS, Azure). Proven track
record in data migration, ETL process optimization, data warehousing, and analysis, with a strong focus on improving
data quality, performance, and scalability. Skilled in designing and implementing scalable data solutions, with expertise
in Azure and AWS services.
TECHNICAL SKILLS
Languages : Python, SQL, P, PL/SQL
Big Data Technologies : Apache Airflow, Apache Hadoop, Apache Kafka, DBT, Talend, Spark, HDFS,
Hadoop, Nifi, Microsoft Fabric
Databases and Data Warehouse : MySQL, Oracle, SQL Server, PostgreSQL, AWS Redshift, Snowflake, DynamoDB
Cloud Technologies : AWS(S3, EC2, EMR, RDS, Lambda, Glue, IAM, CloudWatch, Dynamo DB,ECR,ECS,
EBS,VPC) ,Azure(Data Lake, ADF, Databricks, Synapse Analytics, Blog Storage,SQL),
Docker, Kubernetes, GitLab, Teraform
Visualization Tools : Tableau, PowerBI
WORK EXPERIENCE
Morgan Stanley, Washington, D.C. June 2023 - Present
Data Engineer
Developed and maintained efficient data pipeline architectures within Microsoft Azure, leveraging Azure
Data Factory, Azure Databricks, and Azure Data Lake Store Gen2.
Developed Azure Data Factory pipelines for bulk data transfer from relational databases to Azure Data
Lake Gen2, ensuring smooth integration.
Utilized expertise in Azure Data Lake Storage to design and implement a sophisticated data management
system, efficiently handling massive volumes of structured and unstructured data and enabling high-
performance data retrieval and analytics capabilities.
Leveraged Azure SQL Database and Cosmos DB to architect, develop, and fine-tune database schemas
and queries, achieving a 30% enhancement in data retrieval performance through expert optimization
techniques
Transformed ETL operations by leveraging Azure Data Factory and Databricks to automate workflows,
resulting in a 45% boost in processing speed, improved data quality, and increased business agility.
Improved data processing speed by 40% by designing and implementing streamlined ETL processes and
optimized pipelines, demonstrating expertise in data workflow optimization and efficiency
enhancement."
Applied Spark SQL functions to migrate data from staging hive tables to both fact and dimension tables.
Created Databricks notebooks (Python, PySpark, Scala, Spark SQL) for transforming data across Azure
Data Lake Gen2 s Raw, Stage, and Curated zones.
Implemented JSON scripts to deploy Azure Data Factory (ADF) pipelines, utilizing SQL Activity for data
processing.
Built technology demonstrations using Azure EventHub, Stream Analytics, PowerBI.
Engineered merge scripts for UPSERT operations in Snowflake from ETL sources.
Created JSON scripts to deploy data processing pipelines in Azure Data Factory (ADF) using SQL Activity.
Managed importing and exporting data from HDFS using Sqoop, addressing access, performance, and
patch/upgrade issues.
Experienced in Docker container management, including snapshots, attaching to running containers, image
removal, directory structure management, and container administration.
Society for Health and Medical Technology, Hyderabad, India January 2021 - July 2022
Data Engineer/ Data Analyst
Created pipelines to collect data from diverse sources including user interactions, medical devices, and external
healthcare databases.
Designed CloudFormation templates for AWS services such as SQS, Elasticsearch, DynamoDB, Lambda, EC2,
VPC, RDS, S3, IAM, and CloudWatch, ensuring seamless integration with Service Catalog.
Developed AWS Lambda functions in Python for efficient deployment management and integrated public-facing
websites on Amazon Web Services with other application infrastructures.
Engineered and implemented ETL processes using AWS Glue to migrate data from external sources like S3 and
ORC/Parquet/Text Files into AWS Redshift.
Created external tables with partitions using Hive, AWS Athena, and Redshift, and wrote PySpark code for AWS
Glue jobs and EMR tasks.
Demonstrated proficiency in AWS services including S3, EC2, IAM, and RDS, and expertise in orchestration and
data pipelines using AWS Step Functions, Data Pipeline, and Glue.
Utilized and optimized relational databases (e.g., Microsoft SQL Server, Oracle, MySQL) and columnar
databases (e.g., Amazon Redshift, Microsoft SQL Data Warehouse).
Improved query performance by 35% through expert SQL optimization, resulting in faster data retrieval,
enhanced user experience, and optimized system resources.
Stored log files securely in AWS S3 with versioning for highly sensitive information, and integrated AWS
DynamoDB via AWS Lambda for storing item values and backup via DynamoDB streams.
Successfully optimized resource utilization and leveraged cost-effective AWS services, resulting in a significant
20% cost savings and demonstrating expertise in cloud cost optimization and resource efficiency
Automated routine AWS tasks such as snapshot creation using Python scripts, and installed and configured
Apache Airflow for AWS S3 buckets, creating DAGs to manage Airflow workflows.
Prepared scripts to automate data ingestion using PySpark and Scala from various sources including APIs, AWS
S3, Teradata, and Redshift.
Implemented AWS Data Pipeline to configure data loads from S3 into Redshift, and developed Python scripts
for data aggregation, querying, and writing data back into OLTP systems using DataFrames, SQL, and
RDDs/MapReduce in Spark.
Created dynamic and intuitive Tableau dashboards and reports, elevating data visualization and informed
decision-making by 30%, and providing stakeholders with actionable insights and comprehensive analytics to
drive business growth.
EDUCATION
Master of Science (M.S) in Computer Science August 2022 May 2024
The George Washington University, Washington, D.C
Bachlor of Technology (B.Tech) in Computer Science July 2018 May 2022
Gitam University, Hyderabad, Telangana, India
PROJECTS
Reddit Data Pipeline | Data Engineering
Reddit Data Pipeline for the "AAPL" subreddit represents a sophisticated solution for extracting, processing, and analyzing user-
generated content from Reddit. By integrating cutting-edge technologies such as PRAW, Amazon S3, Snowflake, and Apache
Airflow, the pipeline delivers actionable insights and trends surrounding Apple Inc.
Stadium List Data Pipeline | Data Engineering
This project is designed to automate the process of fetching, cleaning, and processing stadium data from Wikipedia using Python
and Apache Airflow. The cleaned data is then stored in Azure Data Lake for further analysis and processing. This pipeline serves as
a foundation for data-driven insights into stadium statistics and characteristics.
WeCurIT | Software Engineer -Backend and Database Development
Led backend infrastructure development for WeCureIT's appointment booking website as Scrum Leader, fostering teamwork and
effective communication. Delivered project milestones efficiently, ensuring alignment with stakeholder requirements.
|