Principal Data Integration Engineer Aug ...

Principal Data Integration Engineer Aug ...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Principal Data Integration Engineer AUG-2021 - Present
Target Location	US-NJ-Secaucus
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Principal Data Engineer Hoboken, NJ

Data Integration Engineering Rutherford, NJ

Principal Data Scientist New York City, NY

Data Management Integration Bridgewater, NJ

sr. Data Integration Developer Hillsborough, NJ

Business Intelligence Data Analyst Jersey City, NJ

SENIOR PRINCIPAL ENGINEER (.NET) Trenton, NJ

Candidate's Name SecCandidate's Name SUMMARY With 9 years of experience in the tech industry, I have consistently driven technological innovation in several startups, from concept to product launch. I specialize in advanced SQL, Python scripting for ETL, and Java/Scala integrated with Apache Kafka and Apache Spark. My expertise covers real-time data processing with Apache Flink, Hadoop ecosystems, and managing RDBMS like PostgreSQL and MySQL, along with NoSQL platforms such as MongoDB. Skilled in data warehousing with Redshift, Snowflake, and Databricks, I have designed and scaled data architectures using Star/Snowflake schemas on AWS, Google Cloud, and Azure, optimizing processes with tools like Terraform and CloudFormation. SKILLS Programming Languages Advanced SQL Techniques, Python for Scripting & ETL, Java/Scala & Frameworks: for Apache Kafka & Apache Spark Data Processing Systems: Apache Spark Core & SQL, Databricks, Apache Flink for Real-Time Processing, Hadoop Ecosystem (MapReduce, Hive, HBase) Database & Storage Solu- Traditional RDBMS (PostgreSQL, MySQL, Oracle), NoSQL Databases tions: (Cassandra, MongoDB, DynamoDB), Columnar Storage (Parquet, ORC), Time-Series Databases (InfluxDB, TimescaleDB) Data Warehousing: Amazon Redshift, Snowflake, Google BigQuery, Azure Synapse Analytics, Databricks, Data Lake & Data Lakehouse Architectures SQL Mastery: Advanced SQL Concepts, Window Functions, Stored Procedures Python Expertise: Scripting, Data Manipulation, ETL Tasks Java/Scala: Especially for Apache Kafka & Apache Spark Data Modeling & Architect- Star Schema, Snowflake Schema, Normalization & Denormalization ing: Techniques, Data Lakes & Data Lakehouse Architectures, Data Architecture Cloud Solutions: AWS (S3, EC2, EMR, Glue, Lambda), Google Cloud Platform (BigQuery, Dataflow,Pub/Sub), Microsoft Azure (Azure Data Factory, Azure Blob Storage, HDInsight) Infrastructure Automation Terraform, CloudFormation, Pulumi Tools: Event Streaming & Real- Apache Kafka Time Analytics: ETL Tools: Apache NiFi, Talend, Informatica, Microsoft SSI Data Analysis: SQL querying, data cleaning, statistical analysis Machine Learning: TensorFlow, Scikit-learn (model deployment) Data Governance: Compliance (GDPR, HIPAA) PROFESSIONAL EXPERIENCE Stealth AI Startup (Stitch Vision) New Jersey City Principal Data Integration Engineer AUG-2021 - Present Pioneered a real-time inventory tracking system using Apache Kafka for event streaming, ensuring live updates on stock availability. Integrated Databricks to enhance real-time data analytics and streamline event processing. Developed predictive models in Python to forecast demand and optimize reordering processes, reducing stock-outs by 20%. Utilized NoSQL databases like MongoDB for managing vast product catalogs, ensuring swift data re- trievals and updates. Leveraged Apache Spark for processing and analyzing supply chain data, identifying bottlenecks and areas of improvement. Designed an integrated data warehouse using Snowflake, consolidating data from various supply chain touchpoints for holistic analytics. Employed Data Architecture principles to ensure a scalable and flexible data environment. Architected a waste tracking system using Java, categorizing and monitoring waste in real time to stream- line recycling processes. Integrated sensors and IoT data with Apache Flink for real-time processing, enabling dynamic route optimization for waste collection trucks Deployed time-series databases like InfluxDB to track waste generation patterns over time. Automated infrastructure scaling on AWS using Terraform, ensuring robustness during peak data inflows. Introduced a data Lakehouse architecture, enhancing the flexibility and scalability of waste data storage and analysis Horizon Technologies (Addo AI) San Francisco CA Data Engineering Team Lead JAN-2018 - JAN-2021 Conceptualized and developed a user health data platform, capturing fitness metrics using Apache Kafka streams. Implemented data lakes on the Google Cloud Platform, storing diverse user data like heart rates, step counts, and diet logs. Leveraged Python scripts for ETL processes, cleaning and transforming wearable device data for analysis. Employed Apache Spark's machine-learning libraries to create personalized workout and diet plans. De- signed a responsive querying system using advanced SQL techniques, providing instant insights into user fitness trends. Optimized CRM databases, primarily using PostgreSQL, ensuring swift data retrieval and efficient storage. Integrated real-time customer interaction data using Apache Kafka, enhancing the responsiveness of sales and support teams. Employed Star and Snowflake Schemas to structure customer data, facilitating faster report generation. Streamlined customer segmentation using clustering techniques in Apache Spark, enabling targeted mar- keting campaigns. Automated data integration pipelines using Apache NiFi, ensuring timely synchronization of CRM data across platform Mercurial Minds Hoboken, New Jersey Data Engineer Jan 2015 - Jan 2018 Developed a comprehensive document management system, with real-time indexing and retrieval capa- bilities powered by Apache Kafka and Apache Flink. Utilized columnar storage systems like Parquet for efficient storage and retrieval of large documents. Au- tomated document versioning and backup processes on AWS using CloudFormation templates. Designed a robust search engine using Python, enabling users to quickly find documents based on content, meta- data, and tags. Integrated OCR capabilities, transforming scanned documents into searchable and editable formats. Led the development of a collaborative platform, allowing real-time data sharing and interaction using Web- Sockets and Apache Kafka Facilitated seamless integration of third-party tools using Python scripting for ETL tasks. Leveraged AWS services like Lambda and EC2 for on-demand scalability during peak collaboration hours. Engineered a data backup system on Azure Blob Storage, ensuring data integrity and availability. Introduced real-time analytics on collaboration patterns using Apache Spark, providing insights to teams on productivity metrics EDUCATION University of Management and Technology Bachelors of Science Computer Science

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: