Quantcast

Data Engineer Resume Harrison, NJ
Resumes | Register

Candidate Information
Name Available: Register for Free
Title data engineer
Target Location US-NJ-Harrison
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Senior Data / Cloud Engineer Manhattan, NY

Sr.data engineer West New York, NJ

Data Engineer Reliability Engineering West New York, NJ

Senior Data Engineer New York City, NY

Data Center Network Engineer Clifton, NJ

Data Engineer Sql Server Woodbridge, NJ

Data Engineer Secaucus, 07305, NJ

Click here or scroll down to respond to this candidate
                                                   Candidate's Name
                       New Jersey, USA | PHONE NUMBER AVAILABLE | EMAIL AVAILABLE| LinkedIn
SUMMARY
 Data Engineer with 5 years of experience in designing and maintaining scalable ETL pipelines using Apache Spark, Python, and SQL.
Proficient in AWS services, including Glue, Redshift, SageMaker, and DMS, as well as Azure Data Factory. Skilled in data integration,
predictive maintenance, and real-time data streaming with Apache Kafka and Amazon Kinesis. Demonstrated success in implementing
data quality and governance frameworks with Apache Atlas and AWS Glue Data Catalog. Adept at deploying machine learning models,
creating custom dashboards with Tableau, and automating pipeline monitoring with Prometheus and Grafana.

EDUCATION
New Jersey Institute of Technology, NJ, USA                                                                                   Dec 2023
Master of Science in Computer Science
Anna University, Chennai, India                                                                                               Apr 2020
Bachelor of Engineering (B.Tech.), Computer Science and Engineering

PROFESSIONAL EXPERIENCE
Humana, USA | Data Engineer                                                                                        May 2023   Present
   Designed and maintained scalable ETL pipelines using Apache Spark, Python, and SQL. Conducted performance tuning for SQL, NoSQL
   databases, and Google BigQuery, cutting query response times by 30%.
   Implemented workflows with AWS Glue, Amazon Redshift, and Azure Synapse, optimizing data transformation and reducing
   processing time by 40%.
   Designed and implemented Snowflake database schemas and tables, optimizing for performance and scalability.
   Utilized Snowflake s cloud data platform to manage and analyze large datasets, ensuring efficient data storage and retrieval.
   Integrated Snowflake with Python/PySpark for advanced data processing and transformation tasks.
   Use Docker for containerization and Kubernetes for orchestration to streamline deployment and management of data applications.
   Tuned Snowflake queries and optimized performance using best practices for data warehousing and SQL execution.
   Developed and deployed AI-enhanced data quality solutions, incorporating anomaly detection algorithms and natural language
   processing (NLP) techniques to identify and resolve data inconsistencies. Utilized TensorFlow and PyTorch for model training, which
   improved data accuracy by 30% and streamlined data governance processes.
   Work with various stakeholders, including data, design, product, and executive teams, to assist with data-related technical issues.
   Develop and maintain data pipelines using Python, leveraging Apache Airflow for workflow scheduling and orchestration, and
   integrating Apache Flink for stream processing.
   Ensured GDPR and CCPA compliance by designing and implementing data pipelines with robust data anonymization, encryption, and
   audit mechanisms, enabling secure handling of sensitive data while fulfilling data subject access requests (DSARs) in a timely manner.
   Spearheaded the development and deployment of multiple machine learning and deep learning models across various business units,
   optimizing predictive analytics workflows, and reducing data processing time by 30%.
   Oversaw the installation, configuration, and maintenance of data center hardware, including servers, storage systems, and
   networking equipment, ensuring optimal performance and uptime.
   Supervised the deployment, configuration, and maintenance of servers, storage systems, and networking hardware, ensuring high
   availability, security, and optimal environmental conditions. Implemented best practices for power management, cooling, and space
   optimization to enhance data center efficiency and reliability.
   Implement and optimize query APIs using JSON, XML, and other formats, ensuring efficient data retrieval and integration across
   systems. Employ Unix-based command-line interfaces and Bash scripts to automate tasks, manage system operations, and streamline
   data workflows.
   Applied in-depth knowledge of linear algebra, calculus, and probability theory to optimize model performance, achieving a 10%
   increase in model accuracy for predictive analytics solutions.
   Manage and optimize NoSQL databases, with a focus on Firestore and MongoDB, ensuring efficient data storage and retrieval.
   Manage and orchestrate complex workflows using Kubernetes and Airflow, ensuring high availability and fault tolerance in data
   processing pipelines.
   provisioning, monitoring, and optimizing cost and performance in cloud environments. Spearheaded the development of a unified
   customer data platform using AWS Redshift, Spark, and Snowpark, consolidating data from multiple sources and enabling advanced
   customer analytics and personalized marketing strategies.
   Led the integration of disparate healthcare data sources using FHIR standards, AWS Glue, and Apache Storm, improving data
   interoperability and patient care coordination. Implement progressive deployment models and emerging technologies to enhance
   data engineering practices and capabilities.
   Deployed machine learning models using AWS Sage Maker and Apache Airflow, boosting predictive analytics accuracy by 25%.

Sigma InfoSolutions, India | Data Engineer                                                                       Nov 2019 - Aug 2022
    Designed and implemented scalable big data architectures tailored to business needs, balancing performance requirements with
    considerations for data volume, velocity, and variety.
    Utilized tools like Elasticsearch and NiFi to enhance data processing and search capabilities.
    Applied in-depth Extract, Transform, Load (ETL) skills to orchestrate the smooth movement of data, optimizing processes for
    efficiency and maintaining data integrity throughout. Leveraged Apache Oozie for workflow scheduling and orchestration.
    Conducted data migration and optimization projects, improving data processing efficiency by 35% through Snowflake enhancements.
    Applied encryption and access controls to secure sensitive data in Hadoop clusters, Azure Databricks, and GCP BigQuery. Utilized GCP
    (IAM) and Azure (IAM) for identity and access management, and incorporated Azure Event Hub for real-time data ingestion and
    processing. Utilized Docker and Kubernetes for scalable deployment of data engineering solutions.
    Conducted performance tuning for ETL processes, resulting in a resource utilization optimization of 20%. Utilized GCP BigQuery,
    Dataflow, and Dataproc for optimized data processing.
    Work closely with data scientists to understand data requirements and provide support for data extraction, transformation, and
    analysis. Design scalable data architectures that accommodate growing data volumes and evolving business needs, ensuring future-
    proof solutions.
    Design and maintain data models for efficient data management and integration, ensuring alignment with business needs.
    Implemented monitoring solutions, resulting in a 40% reduction in data quality issues through proactive identification and
    resolution. Used Azure Monitor and GCP Cloud Monitoring for monitoring and alerting.
    Utilized Snowflake for data warehousing, implementing optimized storage and retrieval solutions to support scalable analytics and
    reporting. Integrated Elasticsearch for enhanced search functionalities and data retrieval.
    Designed and optimized data warehousing solutions on Snowflake, implementing scalable data models and query performance
    enhancements that resulted in a 30% reduction in query response times and improved data accessibility for business intelligence.
    Led the migration of on-premises data centers to cloud-based Snowflake environments, executing data extraction, transformation,
    and loading (ETL) processes with Azure Data Factory and AWS Glue. Successfully transitioned over 50TB of data with zero downtime,
    enhancing data scalability and reliability.
    Architected and optimized scalable data pipelines using Apache Spark and Snowflake, integrating real-time data streams from Kafka
    to enable near-instantaneous analytics. Improved data processing efficiency by 35% and reduced latency in real-time reporting.
    Led the migration of on-premises data infrastructures to cloud-based environments, leveraging Snowflake and AWS Glue. Executed
    data transformation and loading processes that facilitated the seamless transition of over 100TB of data, enhancing system scalability
    and performance.
    Developed and maintained scalable ETL pipelines using Azure Data Factory, Azure Databricks, and Azure Logic Apps to support
    business analytics and reporting needs.
    Utilized SQL, MongoDB, and Cosmos DB to extract, retrieve, and analyze data, bridging the gap between technical and business users
    for data-driven decision-making.
    Implemented robust data security and compliance measures, including data encryption, access control, and data masking, to
    safeguard sensitive information within Azure cloud environments.
    Employed machine learning libraries and frameworks such as TensorFlow, PyTorch, and LangChain to build and optimize models,
    resulting in a 30% reduction in training time and increased model reliability.
    Implemented robust CI/CD pipelines using Azure DevOps and GitLab, automating the build, test, and deployment processes for data
    engineering solutions. Enhanced workflow automation, reducing deployment times by 50% and ensuring high-quality, error-free
    releases.
    Developed and maintained advanced data governance frameworks, incorporating automated metadata management and data lineage
    tracking using tools like Collibra. Strengthened data integrity and compliance, achieving a 25% reduction in data discrepancies and
    improving audit readiness.
    Utilize AWS services including S3 and Iceberg for scalable data storage and management, ensuring seamless integration with cloud-
    based architectures. Built and deployed web applications using Django and Flask frameworks, integrating with relational databases
    and external APIs to deliver dynamic and user-friendly solutions.
    Designed and managed cloud-based data solutions on AWS and Azure, utilizing services such as AWS S3, Lambda, and Azure Data
    Factory. Optimized resource provisioning through Infrastructure as Code (IaC) with Terraform, leading to a 40% reduction in
    deployment time and improved resource management.
    Engineered end-to-end ETL solutions with Databricks, incorporating machine learning models for predictive analytics and anomaly
    detection. Enhanced data accuracy and processing efficiency by 30%, enabling more insightful and timely business decisions.
    Built and optimized data lakes using AWS S3 and Snowflake, integrating diverse data sources and implementing data partitioning
    strategies. Achieved a 40% increase in query performance and reduced storage costs through efficient data management practices.
    Utilized advanced data quality techniques, including AI-driven anomaly detection and data cleansing algorithms, to maintain high
    data accuracy and consistency. Improved data quality by 30%, boosting stakeholder confidence in analytical outcomes.

TECHNICAL SKILLS
Methodologies: SDLC, Agile/ Scrum, Waterfall
Language: Python, SQL, R, SCALA, Java
Python Packages: Pandas, NumPy, Matplotlib, SciPy, Scikit-Learn, SeaBorn, PyTorch. TensorFlow, ggplot2, Plotly, Keras
Data Analytics Skills: Data Manipulation, Data Cleaning, Data Visualization, Exploratory Data Analysis, Data Analysis
Data Components: HDFS, Hue, MapReduce, PIG, Hive, Hudi, HCatalog, HBase, Sqoop, DBT, Impala, Zookeeper, Flume, Yarn, Cloudera
Manager, Kerberos, Pyspark, Airflow, Kafka, MongoDB, Snowflake(Snowflake Console ,Presto, SQL Worksheet, Snowpipe, Snowpark),
Informatica, Talend, T-SQL and PL/SQL
Technical Skills: AWS, AZURE(Databricks), GCP, Docker and Kubernetes, Smartsheet, NLP, A/B Testing, Hypothesis testing, ETL,
Databricks, Hadoop, Spark, Snowflake, Big Query, Microsoft Fabric, Apache Airflow
Core: Data Engineering, Data Pipeline, ETL (Extract, Transform, Load), Data Integration, Data Warehousing, Big Data, Data Modeling, Data
Migration, Data Cleansing, Data Transformation, Data Aggregation, Data Architecture, Real-Time Data Processing, Batch Processing, Data
Ingestion, Data Center, Data Governance, Data Quality, Data Catalog, Metadata Management, Distributed Systems, Data Partitioning,
Columnar Storage, Cloud Data Platforms, Data Staging, Stream Processing, OLTP/OLAP, Data Privacy & Security
Others: Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving
Databases: MySQL, MS SQL Server, Oracle.
Tools: Tableau, Power BI, Advanced Excel, MS Office Suite (Excel, Visio, Access, Word, PowerPoint), Visual Studio, GIT, Jupyter Notebook
Version Control: Git, GitHub
Operating Systems: Windows, macOS

CERTIFICATION
         Microsoft Certified: [DP-203] Azure Data Engineer Associate.
         IBM Developer Skill Networks Certified: Python for Data Science.
         Udemy Certified: Data Engineering Essentials using SQL, Python and PySpark.

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise