Quantcast

Dear Hiring Manager I M Excited To Apply...
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Dear Hiring Manager, I'm excited to apply for the Data Engineer
Target Location US-VA-Arlington
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Program Manager /Marketing Communications Specialist Hyattsville, MD

Project Manager Technical Ashburn, VA

Cyber Security Risk Management Baltimore, MD

Chef/Kitchen Manager Baltimore, MD

Logistics Manager Regional Office Fort Meade, MD

Data Management Bowie, MD

Facility Manager Contract Baltimore, MD

Click here or scroll down to respond to this candidate
                                                 Candidate's Name
                EMAIL AVAILABLE | PHONE NUMBER AVAILABLE | Arlington, VA Street Address , USA
                                       Linkedin | Github | Portfolio

                                                 PROFILE SUMMARY
Data Engineering Professional with 3+ years of experience driving data solutions across various platforms, leveraging
expertise in Big Data Engineering, Spark, PySpark, SQL, Python, and cloud technologies (AWS, Azure). Proven track
record in data migration, ETL process optimization, data warehousing, and analysis, with a strong focus on improving
data quality, performance, and scalability. Skilled in designing and implementing scalable data solutions, with expertise
in Azure and AWS services.

TECHNICAL SKILLS
Languages                    : Python, SQL, P, PL/SQL
Big Data Technologies         : Apache Airflow, Apache Hadoop, Apache Kafka, DBT, Talend, Spark, HDFS,
                                 Hadoop, Nifi, Microsoft Fabric
Databases and Data Warehouse : MySQL, Oracle, SQL Server, PostgreSQL, AWS Redshift, Snowflake, DynamoDB
Cloud Technologies            : AWS(S3, EC2, EMR, RDS, Lambda, Glue, IAM, CloudWatch, Dynamo DB,ECR,ECS,
                                 EBS,VPC) ,Azure(Data Lake, ADF, Databricks, Synapse Analytics, Blog Storage,SQL),
                                 Docker, Kubernetes, GitLab, Teraform
Visualization Tools           : Tableau, PowerBI

                                               WORK EXPERIENCE
Morgan Stanley, Washington, D.C.                                                           June 2023 - Present
Data Engineer
     Developed and maintained efficient data pipeline architectures within Microsoft Azure, leveraging Azure
       Data Factory, Azure Databricks, and Azure Data Lake Store Gen2.
     Developed Azure Data Factory pipelines for bulk data transfer from relational databases to Azure Data
       Lake Gen2, ensuring smooth integration.
     Utilized expertise in Azure Data Lake Storage to design and implement a sophisticated data management
       system, efficiently handling massive volumes of structured and unstructured data and enabling high-
       performance data retrieval and analytics capabilities.
     Leveraged Azure SQL Database and Cosmos DB to architect, develop, and fine-tune database schemas
       and queries, achieving a 30% enhancement in data retrieval performance through expert optimization
       techniques
     Transformed ETL operations by leveraging Azure Data Factory and Databricks to automate workflows,
       resulting in a 45% boost in processing speed, improved data quality, and increased business agility.
     Improved data processing speed by 40% by designing and implementing streamlined ETL processes and
       optimized pipelines, demonstrating expertise in data workflow optimization and efficiency
       enhancement."
     Applied Spark SQL functions to migrate data from staging hive tables to both fact and dimension tables.
     Created Databricks notebooks (Python, PySpark, Scala, Spark SQL) for transforming data across Azure
       Data Lake Gen2 s Raw, Stage, and Curated zones.
     Implemented JSON scripts to deploy Azure Data Factory (ADF) pipelines, utilizing SQL Activity for data
       processing.
     Built technology demonstrations using Azure EventHub, Stream Analytics, PowerBI.
     Engineered merge scripts for UPSERT operations in Snowflake from ETL sources.
     Created JSON scripts to deploy data processing pipelines in Azure Data Factory (ADF) using SQL Activity.
     Managed importing and exporting data from HDFS using Sqoop, addressing access, performance, and
       patch/upgrade issues.
     Experienced in Docker container management, including snapshots, attaching to running containers, image
       removal, directory structure management, and container administration.
Society for Health and Medical Technology, Hyderabad, India                                  January 2021 - July 2022
Data Engineer/ Data Analyst
   Created pipelines to collect data from diverse sources including user interactions, medical devices, and external
     healthcare databases.
   Designed CloudFormation templates for AWS services such as SQS, Elasticsearch, DynamoDB, Lambda, EC2,
     VPC, RDS, S3, IAM, and CloudWatch, ensuring seamless integration with Service Catalog.
   Developed AWS Lambda functions in Python for efficient deployment management and integrated public-facing
     websites on Amazon Web Services with other application infrastructures.
   Engineered and implemented ETL processes using AWS Glue to migrate data from external sources like S3 and
     ORC/Parquet/Text Files into AWS Redshift.
   Created external tables with partitions using Hive, AWS Athena, and Redshift, and wrote PySpark code for AWS
     Glue jobs and EMR tasks.
   Demonstrated proficiency in AWS services including S3, EC2, IAM, and RDS, and expertise in orchestration and
     data pipelines using AWS Step Functions, Data Pipeline, and Glue.
   Utilized and optimized relational databases (e.g., Microsoft SQL Server, Oracle, MySQL) and columnar
     databases (e.g., Amazon Redshift, Microsoft SQL Data Warehouse).
   Improved query performance by 35% through expert SQL optimization, resulting in faster data retrieval,
     enhanced user experience, and optimized system resources.
   Stored log files securely in AWS S3 with versioning for highly sensitive information, and integrated AWS
     DynamoDB via AWS Lambda for storing item values and backup via DynamoDB streams.
   Successfully optimized resource utilization and leveraged cost-effective AWS services, resulting in a significant
     20% cost savings and demonstrating expertise in cloud cost optimization and resource efficiency
   Automated routine AWS tasks such as snapshot creation using Python scripts, and installed and configured
     Apache Airflow for AWS S3 buckets, creating DAGs to manage Airflow workflows.
   Prepared scripts to automate data ingestion using PySpark and Scala from various sources including APIs, AWS
     S3, Teradata, and Redshift.
   Implemented AWS Data Pipeline to configure data loads from S3 into Redshift, and developed Python scripts
     for data aggregation, querying, and writing data back into OLTP systems using DataFrames, SQL, and
     RDDs/MapReduce in Spark.
   Created dynamic and intuitive Tableau dashboards and reports, elevating data visualization and informed
     decision-making by 30%, and providing stakeholders with actionable insights and comprehensive analytics to
     drive business growth.

                                               EDUCATION
Master of Science (M.S) in Computer Science                                                      August 2022   May 2024
The George Washington University, Washington, D.C

Bachlor of Technology (B.Tech) in Computer Science                                                   July 2018  May 2022
Gitam University, Hyderabad, Telangana, India

                                                        PROJECTS
Reddit Data Pipeline | Data Engineering
Reddit Data Pipeline for the "AAPL" subreddit represents a sophisticated solution for extracting, processing, and analyzing user-
generated content from Reddit. By integrating cutting-edge technologies such as PRAW, Amazon S3, Snowflake, and Apache
Airflow, the pipeline delivers actionable insights and trends surrounding Apple Inc.

Stadium List Data Pipeline | Data Engineering
This project is designed to automate the process of fetching, cleaning, and processing stadium data from Wikipedia using Python
and Apache Airflow. The cleaned data is then stored in Azure Data Lake for further analysis and processing. This pipeline serves as
a foundation for data-driven insights into stadium statistics and characteristics.

WeCurIT | Software Engineer -Backend and Database Development
Led backend infrastructure development for WeCureIT's appointment booking website as Scrum Leader, fostering teamwork and
effective communication. Delivered project milestones efficiently, ensuring alignment with stakeholder requirements.

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise