| 20,000+ Fresh Resumes Monthly | |
|
|
| | Candidate's Name Secaucus, 07305 NJ PHONE NUMBER AVAILABLE
EMAIL AVAILABLE LINKEDIN LINK AVAILABLE
Data Engineer | Transforming Data into Insights | Principal Data Integration Engineer
Real-Time Processing | Predictive Modeling | Cloud Solutions | Automated Data Warehousing
With 9 years of intensive experience in the tech industry, I have had the distinct privilege of driving
technological innovation within several startups, playing a pivotal role in conceptualizing and actualizing
products from the ground up. My expertise spans a vast array of cutting-edge technologies and systems. In
the realm of Programming Languages & Frameworks, I have harnessed advanced SQL techniques, leveraged
Python for diverse scripting and ETL processes, and employed Java/Scala, especially in conjunction with
platforms like Apache Kafka and Apache Spark. My work in Data Processing Systems is marked by a deep-
seated proficiency in Apache Spark, real-time data processing with Apache Flink, and a thorough grasp of the
Hadoop ecosystem. Within the sphere of Database & Storage Solutions, I have navigated the intricacies of
various RDBMS such as PostgreSQL and MySQL, tapped into the capabilities of NoSQL platforms like
MongoDB, and managed columnar and time-series databases with systems like Parquet and InfluxDB. I have
an extensive background in Data Warehousing Technologies, familiarizing myself with powerhouse
platforms like Amazon Redshift and Snowflake. This experience is complemented by my adeptness in Data
Movement & ETL Instruments, having worked extensively with tools like Apache NiFi, Apache Kafka, and
more. My skills in Data Design & Architecture have empowered startups to leverage effective data modelling
techniques, from Star and Snowflake Schemas to adept normalization/denormalization strategies. This
foundation is further reinforced by my expertise in cloud solutions across AWS, Google Cloud, and Azure
platforms, as well as Infrastructure Automation Tools, streamlining processes using Terraform,
CloudFormation, and Pulumi.
Professional Experience
Stealth AI Startup (Stitch Vision), New Jersey City
Principal Data Integration Engineer 08/2021-Present
Pioneered a real-time inventory tracking system using Apache Kafka for event streaming, ensuring
live updates on stock availability.
Developed predictive models in Python to forecast demand and optimize reordering processes,
reducing stock-outs by 20%.
Utilized NoSQL databases like MongoDB for managing vast product catalogs, ensuring swift data
retrievals and updates.
Leveraged Apache Spark for processing and analyzing supply chain data, identifying bottlenecks and
areas of improvement.
Designed an integrated data warehouse using Snowflake, consolidating data from various supply
chain touchpoints for holistic analytics.
Architected a waste tracking system using Java, categorizing and monitoring waste in real time to
streamline recycling processes
Integrated sensors and IoT data with Apache Flink for real-time processing, enabling dynamic route
optimization for waste collection trucks. Deployed time-series databases like InfluxDB to track waste
generation patterns over time.
Automated infrastructure scaling on AWS using Terraform, ensuring robustness during peak data
inflows.
Introduced a data Lakehouse architecture, enhancing the flexibility and scalability of waste data
storage and analysis
Horizon Technologies (Addo AI), San Francisco CA
Data Engineering Team Lead 01/2018-01/2021
Conceptualized and developed a user health data platform, capturing fitness metrics using Apache
Kafka streams.
Implemented data lakes on the Google Cloud Platform, storing diverse user data like heart rates, step
counts, and diet logs.
Leveraged Python scripts for ETL processes, cleaning and transforming wearable device data for
analysis.
Employed Apache Spark's machine-learning libraries to create personalized workout and diet plans.
Designed a responsive querying system using advanced SQL techniques, providing instant insights
into user fitness trends.
Optimized CRM databases, primarily using PostgreSQL, ensuring swift data retrieval and efficient
storage.
Integrated real-time customer interaction data using Apache Kafka, enhancing the responsiveness of
sales and support teams.
Employed Star and Snowflake Schemas to structure customer data, facilitating faster report
generation.
Streamlined customer segmentation using clustering techniques in Apache Spark, enabling targeted
marketing campaigns.
Automated data integration pipelines using Apache NiFi, ensuring timely synchronization of CRM
data across platforms
Mercurial Minds
Data Engineer 01/2015-01/2018
Developed a comprehensive document management system, with real-time indexing and retrieval
capabilities powered by Apache Kafka and Apache Flink.
Utilized columnar storage systems like Parquet for efficient storage and retrieval of large documents.
Automated document versioning and backup processes on AWS using CloudFormation templates.
Designed a robust search engine using Python, enabling users to quickly find documents based on
content, metadata, and tags.
Integrated OCR capabilities, transforming scanned documents into searchable and editable formats.
Led the development of a collaborative platform, allowing real-time data sharing and interaction
using WebSockets and Apache Kafka.
Facilitated seamless integration of third-party tools using Python scripting for ETL tasks.
Leveraged AWS services like Lambda and EC2 for on-demand scalability during peak collaboration
hours. Engineered a data backup system on Azure Blob Storage, ensuring data integrity and
availability.
Introduced real-time analytics on collaboration patterns using Apache Spark, providing insights to
teams on productivity metrics
Education
Bachelor s in Information technology, Software Engineering,
University of Management and Technology
Skills
Programming Languages & Frameworks: Advanced SQL Techniques | Python for Scripting & ETL |
Java/Scala for Apache Kafka & Apache Spark
Data Processing Systems: Apache Spark Core & SQL | Apache Flink for Real-Time Processing | Hadoop
Ecosystem (MapReduce, Hive, HBase)
Database & Storage Solutions: Traditional RDBMS (PostgreSQL, MySQL, Oracle) | NoSQL Databases
(Cassandra, MongoDB, DynamoDB) | Columnar Storage (Parquet, ORC) | Time-Series Databases (InfluxDB,
TimescaleDB)
Data Warehousing: Amazon Redshift | Snowflake | Google BigQuery | Azure Synapse Analytics
SQL Mastery: Advanced SQL Concepts | Window Functions | Stored Procedures
Python Expertise: Scripting | Data Manipulation | ETL Tasks
Java/Scala: Especially for Apache Kafka & Apache Spark
Data Modeling & Architecting: Star Schema | Snowflake Schema | Normalization & Denormalization
Techniques | Data Lakes & Data Lakehouse Architectures
Cloud Solutions: AWS (S3, EC2, EMR, Glue, Lambda) | Google Cloud Platform (BigQuery, Dataflow, Pub/Sub)
| Microsoft Azure (Azure Data Factory, Azure Blob Storage, HDInsight)
Infrastructure Automation Tools: Terraform | CloudFormation | Pulumi
Event Streaming & Real-Time Analytics: Apache Kafka
ETL Tools: Apache NiFi | Talend | Informatica | Microsoft SSIS
|