| 20,000+ Fresh Resumes Monthly | |
|
|
| | Candidate's Name
Data Engineer | Transforming Data Into Insights
Secaucus, 07305 NJ
PHONE NUMBER AVAILABLE
EMAIL AVAILABLE
With 9 years of intensive experience in the tech industry, I have had the distinct privilege of driving
technological innovation within several startups, playing a pivotal role in conceptualizing and
actualizing products from the ground up. My expertise spans a vast array of cutting-edge technologies
and systems. In the realm of Programming Languages & Frameworks, I have harnessed advanced SQL
techniques, leveraged Python for diverse scripting and ETL processes, and employed Java/Scala,
especially in conjunction with platforms like Apache Kafka and Apache Spark. My work in Data
Processing Systems is marked by a deep-seated proficiency in Apache Spark, real-time data
processing with Apache Flink, and a thorough grasp of the Hadoop ecosystem. Within the sphere of
Database & Storage Solutions, I have navigated the intricacies of various RDBMS such as PostgreSQL
and MySQL, tapped into the capabilities of NoSQL platforms like MongoDB, and managed columnar
and time-series databases with systems like Parquet and InfluxDB. I have an extensive background in
Data Warehousing Technologies, familiarizing myself with powerhouse platforms like Amazon Redshift
and Snowflake. This experience is complemented by my adeptness in Data Movement & ETL
Instruments, having worked extensively with tools like Apache NiFi, Apache Kafka, and more. My skills in
Data Design & Architecture have empowered startups to leverage effective data modeling
techniques, from Star and Snowflake Schemas to adept normalization/denormalization strategies. This
foundation is further reinforced by my expertise in cloud solutions across AWS, Google Cloud, and
Azure platforms, as well as Infrastructure Automation Tools, streamlining processes using Terraform,
CloudFormation, and Pulumi.
Linkedin
https://LINKEDIN LINK AVAILABLE
Skills
Traditional RDBMS like PostgreSQL, MySQL, Oracle, etc
NoSQL databases like Cassandra, MongoDB, DynamoDB, etc
Columnar storage systems like Parquet and ORC
Time-series databases like InfluxDB or TimescaleDB
Solutions like Amazon Redshift, Snowflake, Google BigQuery, and Azure Synapse
Analytics
SQL: Mastery of advanced SQL concepts, window functions, stored procedures,
etc
Python: Extensive use for scripting, data manipulation, ETL tasks, et
Java/Scala: Especially if they've been working with tools like Apache Kafka or
Apache Spark
Apache Spark: Mastery of Spark Core, Spark SQL, and streaming capabilities
Apache Flink: For real-time data processing
Hadoop Ecosystem: Deep understanding of MapReduce, Hive, HBase, and
other Hadoop technologies
Data Modeling & Architecting
Star Schema and Snowflake Schema for DW modeling
Normalization and denormalization techniques
Data lakes and data lakehouse architectures
S3, EC2, EMR, Glue, Lambda, etc
Google Cloud Platform: BigQuery, Dataflow, Pub/Sub, etc
Microsoft Azure: Azure Data Factory, Azure Blob Storage, HDInsight, etc
Tools like Terraform, CloudFormation, or Pulumi for infrastructure automation
Apache Kafka: For event streaming and real-time analytics
Talend, Informatica, Microsoft SSIS, etc
Work History
2021-08 - Current Principal Data Integration Engineer
Stealth AI Startup (Stich Vision), New Jersey City , New jersey
Pioneered a real-time inventory tracking system using Apache Kafka for
event streaming, ensuring live updates on stock availability.
Developed predictive models in Python to forecast demand and optimize
reordering processes, reducing stock-outs by 20%.
Utilized NoSQL databases like MongoDB for managing vast product
catalogs, ensuring swift data retrievals and updates.
Leveraged Apache Spark for processing and analyzing supply chain data,
identifying bottlenecks and areas of improvement.
Designed an integrated data warehouse using Snowflake, consolidating
data from various supply chain touchpoints for holistic analytics.
Architected a waste tracking system using Java, categorizing and
monitoring waste in real-time to streamline recycling processes.
Integrated sensors and IoT data with Apache Flink for real-time processing,
enabling dynamic route optimization for waste collection trucks.
Deployed time-series databases like InfluxDB to track waste generation
patterns over time.
Automated infrastructure scaling on AWS using Terraform, ensuring robustness
during peak data inflows.
Introduced a data lakehouse architecture, enhancing the flexibility and
scalability of waste data storage and analysis.
2018-01 - 2021-01 Data Engineering Team Lead
Horizon Technologies (Addo AI) , California , San Francisco
Conceptualized and developed a user health data platform, capturing
fitness metrics using Apache Kafka streams.
Implemented data lakes on Google Cloud Platform, storing diverse user data
like heart rates, step counts, and diet logs.
Leveraged Python scripts for ETL processes, cleaning and transforming
wearable device data for analysis.
Employed Apache Spark's machine learning libraries to create personalized
workout and diet plans.
Designed a responsive querying system using advanced SQL techniques,
providing instant insights into user fitness trends.
Optimized CRM databases, primarily using PostgreSQL, ensuring swift data
retrieval and efficient storage.
Integrated real-time customer interaction data using Apache Kafka,
enhancing the responsiveness of sales and support teams.
Employed Star and Snowflake Schemas to structure customer data,
facilitating faster report generation.
Streamlined customer segmentation using clustering techniques in Apache
Spark, enabling targeted marketing campaigns.
Automated data integration pipelines using Apache NiFi, ensuring timely
synchronization of CRM data across platforms.
2015-01 - 2018-01 Data Engineer
Mercurial Minds
Developed a comprehensive document management system, with real-time
indexing and retrieval capabilities powered by Apache Kafka and Apache
Flink.
Utilized columnar storage systems like Parquet for efficient storage and
retrieval of large documents.
Automated document versioning and backup processes on AWS using
CloudFormation templates.
Designed a robust search engine using Python, enabling users to quickly find
documents based on content, metadata, and tags
Integrated OCR capabilities, transforming scanned documents into
searchable and editable formats.
Led the development of a collaborative platform, allowing real-time data
sharing and interaction using WebSockets and Apache Kafka.
Facilitated seamless integration of third-party tools using Python scripting for
ETL tasks.
Leveraged AWS services like Lambda and EC2 for on-demand scalability
during peak collaboration hours.
Engineered a data backup system on Azure Blob Storage, ensuring data
integrity and availability.
Introduced real-time analytics on collaboration patterns using Apache
Spark, providing insights to teams on productivity metrics.
Education
BS: Information Technology, Software Engineering - University Of Management &
Technology
|