| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Sr. Data Engineer EMAIL AVAILABLE PHONE NUMBER AVAILABLE
https://LINKEDIN LINK AVAILABLE
Profile Summary
AWS certified professional with 5+ years of industry experience in Data Engineering, Big Data/ETL
solution development, Data Visualization & BI Reporting
Hands-on experience with AWS Glue, Lambda, Airflow and Python, Pyspark, Spark SQL and Hadoop
ecosystem like Hadoop, Hive, Spark SQL, MapReduce framework, and Hue
Experience in ETL requirements into Spark Transformations using Spark RDDs, Spark SQL, and Spark Data
frames, and developing Python objects in the OOPS programming paradigm
Experienced in working with APIs, Spark SQL, Window Functions, views, indexes, stored procedures,
UDFs, cursors, and triggers, CTES
Experience in ingesting CSV, text, Excel, parquet, Jason formats of data, and data from RESTful API, Data
AWS S3, Blob storage, and doing transformations and star/snowflake data modeling
Experience with Linux, SDK like Boto3, Slack, BOX SDK, SharePoint SDK, Requests module, OS
module, Logging module, Parsing, File handling, Regular expression and Pandas, NumPy
Designed, developed, confluence documented, and tested ETL jobs, populate data in Data Warehouses and
developed for automating slack alerting, data validations and automated BOX reporting
AWS Certified Solution Architect
https://www.credly.com/badges/33efe5d7-eede-4ec9-8e11-59da47d21afc/linked_in_profile
Professional Experience
Client CVS Health, Dallas, TX Jan 2024 -Present
Designation: Senior Data Engineer
Extracted data from flat files, APIs, AWS S3 using AWS Glue, PySpark, Lambda, SQS, Apache Kafka,
Spark SQL, Python and ensured 99.7% data pipeline uptime using AWS CloudWatch and CloudTrail
Created Partitioned Hive compatible tables (managed and external) using Glue crawlers, and Glue
ETL jobs running on fully managed Apache Spark clusters
Analyzed S3 data with Athena, Redshift Spectrum with Glue catalog, Snowflake stages and SQL
Enabled secure data sharing and utilized Snowflake zero-copy cloning and time Travel for historical
queries, for effective data governance, data recovery
Provisioned and configured ETL pipelines and infrastructure using Terraform as IAC tool
Leveraged internal and external Snowflake tables, Snow Pipe, Snowflake stages and materialized
views to optimize data processing solutions and operational workflows
Worked in Agile (Scrum) Methodology, sprint planning and Jira story and bug tracking and
Confluence, Lucid charts for documentation and GitHub for version control
Client Caterpillar Inc., Dallas, TX May 2023-Aug 2023
Designation: Senior Data Engineer
Worked on AWS cloud platform services like AWS S3, Data Bricks, Athena, Glue, Apache Airflow, Redshift
spectrum
Ingested data to one or more AWS Services - AWS S3 Data Lake, Redshift data warehouse and Dynamo DB
and processed the data using Glue and PySpark, Spark SQL
Developed data ingestion pipelines on using Airflow, Databricks Spark cluster, and Spark SQL and persisted
data into hive tables and snowflake views
Developed python, SQL and PySpark scripts incorporating robust error and failure handling mechanisms
Worked in Agile (Scrum) Methodology, sprint planning and Jira for story and bug tracking, Lucid
charts , Confluence for documentation and GitHub for version control
Used Databricks PySpark jobs to transform and move large amounts of data into and out of databases
such as Amazon S3 and AWS Redshift
Used AWS Redshift Spectrum, and Athena services to query the large amount of data stored on S3 using
the Glue data catalog
Client Mitsubishi Fuso Truck of America, Richmond, VA
Designation: Big Data Engineer Jul 2020 May 2022
Integrated Apache Airflow with AWS to run and develop environment based ETL workflows with the job
submitted on the Databricks cluster
Created Hive-compatible schemas on top of data in AWS S3 Data Lake, used partitioning, cashing,
persisting for optimized ad-hoc queries using Snowflake SQL, Hive queries, and Databricks SQL
Translated business needs into technical specifications, developed KPI metrics and deployed to BI
platform using SQL, Python and AWS cloud
Implemented Spark context, Spark-SQL, Spark Data Frames, Pandas Data frame, and RDDs to
optimize the existing algorithm AWS S3 as a Data Lake/HDFS
Used AWS like EC2, SQS, SNS, IAM, S3, and Dynamo DB to deploy data processing workflows ensuring
fault tolerance, high availability, and auto-scaling using Terraform
Deployed ETL/data pipelines as code using, YAML config, Jenkins CI/CD pipelines to AWS S3 code buckets
Collaborated with cross-functional teams using Zoom, Slack, Confluence to document data engineering
processes and utilized Jira to track and mange stories and bugs
Developed event based and schedule-based data pipelines using Airflow DAG and airflow operators like
Bash Operator, Python Operator, Email Operator, Slack Operator, Snowflake operator
Client DATOMS, Bangalore, India
Designation: Data Engineer May 2019 Jun 2020
Created and deployed SSIS packages to for ETL, job scheduling, and automated email alerts,
error/failure handling with event handlers, conditional splits
Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping
documents, metadata, DDL, and DML
Created SQL server configurations for SSIS packages and created jobs alerts, and SQLmail
agents and scheduling SSIS packages
Implemented ETL Testing scripts for different stages like stage, ODS, Data warehouse
Used T-SQL and various SSIS tasks such as conditional split, derived column, and lookup for data
validation checks during staging, and before loading the data into the data warehouse
Implemented scripts to automate the generation and distribution of periodic reports, ad hoc
reports, reducing manual effort and ensuring timely delivery of insights
Client Airtel, New Delhi, India
Designation: Data Analyst Engineer Oct 2018 May 2019
Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server
Involved in business requirement gathering and translating business requirements into technical design
Created interactive visualizations and drill-down reports in Tableau to explore inventory data and
identify trends and outliers, using features such as filters, groups, and calculated fields
Validated and modeled data to understand or make conclusions from the data for decision-making
Analyzing data and providing reports using advanced excel such as VLOOKUP, SUMIF, and PivotTables,
scenario creation, what if analysis, V - lookup, and H lookup
Education Details
Master s in Business Analytics UT Dallas May 2024
Bachelor s in Electrical and Electronic Engineering NIT Delhi Jul 2018
|