| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Street Address
PHONE NUMBER AVAILABLE # EMAIL AVAILABLE github.com/terrortad/
Education
University of Maryland
Bachelor of Science in Computer Science College Park, Maryland
Experience
Hormel Foods June 2019 Present
Data Engineer Remote, VA
Technologies: Python, SQL, Google Cloud Platform, Airflow, Tableau, Informatica ETL
Oversaw the optimization of data pipelines, streamlining the processing and analysis of 80GB+ of weekly data from
diverse sources including customer demographics, web activity, and supply chain/distribution data..
Implemented Apache Airflow scheduling to automate data pipeline tasks, reducing manual intervention by 40 percent
and ensuring consistent data flow for machine learning models.
Partnered with data scientists to develop a customer churn prediction model using machine learning algorithms.
Analyzed customer behavior data to identify key risk factors, leading to a 15 percent reduction in customer churn rate.
Designed a highly scalable data lake using Microsoft SQL Server to store and manage unstructured data from social
media platforms. This enabled us to perform sentiment analysis and gain valuable customer insights.
Merged real-time shelf data from merchants into Google BigQuery by utilizing the Crisp data platform. As a result, food
waste was decreased and inventory management was enhanced by the alerting of possible out-of-stock situations.
Spins LLC March 2018 May 2019
Data Engineer Chicago, IL
Technologies: Python, SQL, Google Cloud Platform, Airflow, MySQL, Hadoop Distributed File System (HDFS)
Reduced data release execution time by 18 percent through proactive monitoring and troubleshooting of on-premises and
cloud workflows using Python scripts.
Successfully migrated data processing scripts from on-premises servers to Airflow, streamlining deployment and
scheduling compared to cron jobs. This improved collaboration and efficiency within the data operations team.
Utilized Google Cloud Platform (GCP) tools for data operations, including troubleshooting MySQL issues within the
data pipeline and maintaining familiarity with Hadoop Distributed File System (HDFS) and logs for potential future
integration.
Maintained clear communication with product and customer success teams, proactively identifying potential impacts of
upcoming features and customer needs on data operations procedures.
Projects
Spotify ETL Data Pipeline | Python, AWS (S3, Glue, Athena) Kafka, Lambda Functions
Designed and implemented a scalable ETL pipeline on AWS to process terabytes of Spotify data in real-time, enabling
data-driven music trend analysis and artist popularity insights.
Leveraged Apache Kafka to achieve a low-latency data ingestion rate for Spotify data, ensuring continuous updates and
fresh insights.
Developed a user-friendly data analysis framework using AWS Athena, allowing analysts to run complex SQL queries on
structured data for faster and more insightful music trend identification.
Optimized data storage using AWS S3, enabling cost-effective storage of both raw and processed Spotify data for future
analysis.
Reddit Post Data Pipeline | Python, SQL, AWS (Glue, Athena, Reshift), Docker
Designed and implemented a high-performance ETL pipeline for Reddit data using a modern technology stack (Apache
Airflow, Celery, AWS Glue, Athena, Redshift).
Extracted data from the Reddit API and efficiently stored it in an S3 bucket using Airflow.
Developed data transformation logic utilizing AWS Glue and Amazon Athena for active querying and cleaning.
Leveraged Docker containers to ensure consistent and portable project environment.
Technical Skills
Languages: Python, SQL, T-SQL
Developer Tools: SSIS, SQL Server, GitHub, Powershell
Databases: MS SQL, Google Cloud Platform, AWS Redshift
Technologies/Frameworks: Linux, Apache, PowerBI, Docker, Tableau, Hadoop
|