| 20,000+ Fresh Resumes Monthly | |
|
|
| | Candidate's Name
SecCandidate's Name
SUMMARY
With 9 years of experience in the tech industry, I have consistently driven technological innovation in several
startups, from concept to product launch. I specialize in advanced SQL, Python scripting for ETL, and
Java/Scala integrated with Apache Kafka and Apache Spark. My expertise covers real-time data processing
with Apache Flink, Hadoop ecosystems, and managing RDBMS like PostgreSQL and MySQL, along with
NoSQL platforms such as MongoDB. Skilled in data warehousing with Redshift, Snowflake, and Databricks, I
have designed and scaled data architectures using Star/Snowflake schemas on AWS, Google Cloud, and Azure,
optimizing processes with tools like Terraform and CloudFormation.
SKILLS
Programming Languages Advanced SQL Techniques, Python for Scripting & ETL, Java/Scala
& Frameworks: for Apache Kafka & Apache Spark
Data Processing Systems: Apache Spark Core & SQL, Databricks, Apache Flink for Real-Time
Processing, Hadoop Ecosystem (MapReduce, Hive, HBase)
Database & Storage Solu- Traditional RDBMS (PostgreSQL, MySQL, Oracle), NoSQL Databases
tions:
(Cassandra, MongoDB, DynamoDB), Columnar Storage (Parquet, ORC),
Time-Series Databases (InfluxDB, TimescaleDB)
Data Warehousing: Amazon Redshift, Snowflake, Google BigQuery, Azure Synapse Analytics,
Databricks, Data Lake & Data Lakehouse Architectures
SQL Mastery: Advanced SQL Concepts, Window Functions, Stored Procedures
Python Expertise: Scripting, Data Manipulation, ETL Tasks
Java/Scala: Especially for Apache Kafka & Apache Spark
Data Modeling & Architect- Star Schema, Snowflake Schema, Normalization & Denormalization
ing:
Techniques, Data Lakes & Data Lakehouse Architectures,
Data Architecture
Cloud Solutions: AWS (S3, EC2, EMR, Glue, Lambda), Google Cloud Platform (BigQuery,
Dataflow,Pub/Sub), Microsoft Azure (Azure Data Factory, Azure Blob
Storage, HDInsight)
Infrastructure Automation Terraform, CloudFormation, Pulumi
Tools:
Event Streaming & Real- Apache Kafka
Time Analytics:
ETL Tools: Apache NiFi, Talend, Informatica, Microsoft SSI
Data Analysis: SQL querying, data cleaning, statistical analysis
Machine Learning: TensorFlow, Scikit-learn (model deployment)
Data Governance: Compliance (GDPR, HIPAA)
PROFESSIONAL EXPERIENCE
Stealth AI Startup (Stitch Vision) New Jersey City
Principal Data Integration Engineer AUG-2021 - Present
Pioneered a real-time inventory tracking system using Apache Kafka for event streaming, ensuring live
updates on stock availability. Integrated Databricks to enhance real-time data analytics and streamline
event processing.
Developed predictive models in Python to forecast demand and optimize reordering processes, reducing
stock-outs by 20%.
Utilized NoSQL databases like MongoDB for managing vast product catalogs, ensuring swift data re-
trievals and updates.
Leveraged Apache Spark for processing and analyzing supply chain data, identifying bottlenecks and areas
of improvement.
Designed an integrated data warehouse using Snowflake, consolidating data from various supply chain
touchpoints for holistic analytics. Employed Data Architecture principles to ensure a scalable and flexible
data environment.
Architected a waste tracking system using Java, categorizing and monitoring waste in real time to stream-
line recycling processes.
Integrated sensors and IoT data with Apache Flink for real-time processing, enabling dynamic route
optimization for waste collection trucks
Deployed time-series databases like InfluxDB to track waste generation patterns over time.
Automated infrastructure scaling on AWS using Terraform, ensuring robustness during peak data inflows.
Introduced a data Lakehouse architecture, enhancing the flexibility and scalability of waste data storage
and analysis
Horizon Technologies (Addo AI) San Francisco CA
Data Engineering Team Lead JAN-2018 - JAN-2021
Conceptualized and developed a user health data platform, capturing fitness metrics using Apache Kafka
streams.
Implemented data lakes on the Google Cloud Platform, storing diverse user data like heart rates, step
counts, and diet logs.
Leveraged Python scripts for ETL processes, cleaning and transforming wearable device data for analysis.
Employed Apache Spark's machine-learning libraries to create personalized workout and diet plans. De-
signed a responsive querying system using advanced SQL techniques, providing instant insights into user
fitness trends.
Optimized CRM databases, primarily using PostgreSQL, ensuring swift data retrieval and efficient storage.
Integrated real-time customer interaction data using Apache Kafka, enhancing the responsiveness of sales
and support teams.
Employed Star and Snowflake Schemas to structure customer data, facilitating faster report generation.
Streamlined customer segmentation using clustering techniques in Apache Spark, enabling targeted mar-
keting campaigns.
Automated data integration pipelines using Apache NiFi, ensuring timely synchronization of CRM data
across platform
Mercurial Minds Hoboken, New Jersey
Data Engineer Jan 2015 - Jan 2018
Developed a comprehensive document management system, with real-time indexing and retrieval capa-
bilities powered by Apache Kafka and Apache Flink.
Utilized columnar storage systems like Parquet for efficient storage and retrieval of large documents. Au-
tomated document versioning and backup processes on AWS using CloudFormation templates. Designed
a robust search engine using Python, enabling users to quickly find documents based on content, meta-
data, and tags.
Integrated OCR capabilities, transforming scanned documents into searchable and editable formats. Led
the development of a collaborative platform, allowing real-time data sharing and interaction using Web-
Sockets and Apache Kafka
Facilitated seamless integration of third-party tools using Python scripting for ETL tasks.
Leveraged AWS services like Lambda and EC2 for on-demand scalability during peak collaboration hours.
Engineered a data backup system on Azure Blob Storage, ensuring data integrity and availability.
Introduced real-time analytics on collaboration patterns using Apache Spark, providing insights to teams
on productivity metrics
EDUCATION
University of Management and Technology
Bachelors of Science Computer Science
|