| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Contact No: PHONE NUMBER AVAILABLE
Email: EMAIL AVAILABLE
LinkedIn: Candidate's Name | LinkedIn
Professional Summary
Microsoft Certified Data Engineer with expertise in designing data-intensive applications using
Hadoop Ecosystem and Big Data Analytics, Cloud Data engineering (AWS, Azure), Data
Visualization, Data Warehouse, Reporting, Business intelligence and ETL.
10+ years experience in Hadoop Ecosystem components such as MapReduce, Pig, Hive for
Analysis and Sqoop, Flume for data import/export.
Experience in Data Migration from on-premises to Azure and AWS cloud.
Setting Up AWS and Microsoft Azure with Databricks Workspace for Business Analytics.
Experience with Implementing Databricks Delta Lake Architecture (Bronze, Silver and Gold
layers) and the Delta Live Tables (DLT).
Developing ETL transformations and validation using Spark-SQL/Spark Data Frames with Azure
data bricks and Azure Data Factory for distributed data processing and transformation tasks.
Hands-on experience on Azure components such as ADF, ADLS, ADB, Synapse, Azure SQL DB,
Logic apps, Azure functions, Key vault, Integration Runtime etc.
Experience in creating pipeline jobs, scheduling triggers, and Mapping data flows using Azure
Data Factory (V2) and Key Vault to store credentials.
Worked on PySpark using Spark-SQL, Data frame API, and Spark Streaming to increase the
efficiency and optimization of existing Hadoop approaches and build end-to-end data pipelines.
Hands-on experience on AWS components such as EMR, EC2, S3, RDS, IAM, Auto Scaling,
Cloud Watch, SNS, Athena, Glue, Kinesis, Lambda, Redshift, CloudFront, DynamoDB to ensure
a secure zone for an organization in AWS public cloud.
Worked on Snowflake modelling using data warehousing techniques, data cleansing, Slow
Changing Dimension phenomenon, surrogate key assignment and change data capture.
Experienced in Snowpipe, Data Sharing, Database, Schema and Table structures in Snowflake.
Designed and developed logical and physical data models that utilize concepts such as Star
Schema, Snowflake Schema and Slowly Changing Dimensions.
Exposure to multiple Hadoop distributions like Cloudera and Hortonworks platforms.
Good experience working with real-time streaming pipelines using Kafka and Spark-Streaming.
Data Quality, Data Governance, and Data migration implementation in various Big Data projects.
Hands on experience in application servers like WebLogic 8.1, Tomcat 6, JBoss 7.0, WAS 7.0
Experienced with IDE tools such as Eclipse, PyCharm, IBM RSA, and Databricks Notebook.
Implemented MVC framework in projects using Spring, Struts and Hibernate.
Good knowledge in relational database systems (Oracle, DB2, MS-SQL, and MySQL).
Experienced in web design using HTML, CSS, JavaScript and jQuery.
Exposure in coordination with the business, requirement gathering, technical walk-through, and
functional and technical document preparation.
Technical Skills
Azure Cloud Azure Data Factory, Azure Databricks, Azure Synapse, ADLS,
Azure Functions, Azure SQL Database, Azure SQL
Data Warehouse, Logic apps
AWS Cloud EC2, S3, ELB, EBS, VPC, Auto Scaling, CloudFront,
CloudWatch, Kinesis, Redshift
Hadoop Ecosystem Apache Hadoop, Spark, HDFS, MapReduce, PIG, Hive,
Sqoop, Flume, Hbase, PySpark
Cloud Data Warehouse Snowflake
Apache Spark Spark 3.0.1(Spark Core, Spark SQL, Spark Streaming)
Programming Java, Python, Scala, Shell Scripting
Language
Hadoop Distributions Cloudera Distribution 6.12 and HortonWorks (HDP 2.5)
NoSQL Database CosmosDB, MongoDB, Cassandra
Visualization Tool Tableau, PowerBI
Deployment Tools Azure DevOps, GIT, Jenkins, Docker, Kubernetes
J2EE Components JSP, Servlets, JavaBeans, JDBC 2.0
RDBMS Oracle 11g/10g/9i, MS SQL, DB2, MySQL
Web Design HTML, CSS, JQuery, JavaScript
IDE Tools NetBeans 6.5, MyEclipse 6.0, Eclipse 3.7, Eclipse 4.2 Juno
Web Server Apache Tomcat 6.0/7.0/8.0, WebLogic 8.1, JBoss 7.0, WAS 7.0
Version Control Tools CVS, SVN, GIT
Database Tool Toad, SQL Developer
JavaScript Framework Angular JS 1.0, jQuery
Testing Framework Junit
Message Broker Tool Apache Kafka
Web Service SOAP, Restful Web service (Jersey), Micro Services
Rest Client SOAP UI, Postman
Tool Putty, VPN, WinSCP, JIRA, Toad, SQL developer, VS Code
Projects Experience
E&Y, USA - Senior Data Engineer
July-2023 - July-2024
Setup Ingestion process to fetch data from PM1 Server using API and push into Data Lake.
Transform the Data using Azure Databricks PySpark code and insert it to MS SQL Server DB.
Worked on building the Azure data pipeline, Databricks coding, coordination with the business for
requirement gathering and feedback.
Created Databrick notebooks to streamline and curate the data for various business use cases
and also mounted blob storage on Databrick.
Created and optimized Snowflake schemas, tables, and views to support efficient data storage
and retrieval for analytics and reporting purposes.
Worked on Azure DevOps to migrate code from Dev, QA and Prod environment.
Write PySpark code to read JSON files and write the content into ADLS location.
Setup Azure data pipelines to complete one after another to push data to different systems.
Read CSV files from azure blob storage and use PySpark activity code to store data in Data
Warehouse.
Pushed the latest data into the respective application table for further PowerBI report generation.
Worked on ETL pipeline designed reading data from RDBMS oracle and store into ADLS.
Managed code repositories using Git within Azure DevOps for version control.
Implemented CI/CD pipelines using Azure DevOps for data engineering solutions.
Developed Databricks notebooks using Pyspark and Spark-SQL for data extraction,
transformation and aggregation from multiple systems and stored on Azure Data Lake Storage.
Engineered Spark jobs with incremental capabilities to extract data from source databases into
Gen2 staging, optimizing data extraction processes for subsequent loading into Snowflake.
Technology used: Microsoft SQL Server Studio, Azure Data Factory, Azure Data Bricks, SQL,
Snowflake, PowerBI
Eversource Energy, USA - Senior Data Engineer
Feb-2022 - June-2023
Setup Ingestion process to migrate data from MSSQL Server into ADLS Gen 2.
Transform the Data using Azure Databricks and push to target systems for further reporting.
Implemented ETL transformations utilizing Dataflow within Azure Data Factory (ADF), aligning
with business requirements to streamline data processing workflows effectively.
Configured email notifications for ADF pipelines using Logic Apps, enabling proactive monitoring
and alerting of pipeline execution status and ensuring timely response to any issues or failures.
Developed ETL logic after analyzing Technical Specifications and layout documents to perform
data mapping from DataLake to Outbound consumption Layer.
Followed Agile methodology in Azure DevOps to track the stories in Sprint fashion and update
the tasks as per progress and comments as per peer review.
Worked closely with Business Teams to resolve any project related technical concerns.
Involved in coordination with the business for requirement gathering and feedback.
Developed Spark SQL/Scala/Python scripts on Azure Data Bricks using Microsoft Azure Cloud
portal and Azure SQL DB to process the data as per the requirements of the Data Science Team.
Maintaining version control of code using Azure Devops and GIT repository.
Used various Spark Transformations and Actions for cleansing the input data.
Developed and implemented Lakehouse architecture using bronze, silver, and gold layers for
optimized data migration and processing.
Configured Linked Services, Datasets, Cloud and self-hosted Integration Runtime, and Schedule
Triggers in Azure Data Factory.
Ingestion of data from different source systems into the Data warehouse.
Implemented Data profiling, Data cleansing, Data Transformation, Data Modelling
Calculation of Asset Health Index through various parameter of different asset classes such as
Poles, Network Transformers, Network Protectors, Pad-mount Switch-gear, Regulators etc.
Putting asset classes into different categories (Reject, Non-Reject, Replace etc. ) based on their
values of damage.
Prediction of each asset s health based on their condition rating.
Technology used: Microsoft SQL Server Studio, Azure Data Factory, Azure Data Bricks, SQL, PySpark,
PowerBI
HSBC Bank - Data Architect / Engineer
Jan-2020 - Jan-2022
Design ETL framework to ingest data and file from different source systems into Data Lake.
Transform the data by applying business rules and pushing to target systems.
Report creation of refined data using Tableau.
Engineered a serverless data integration solution using AWS Lambda and Glue, which
automated data flows from sources, improving data availability.
Setting up the AWS S3 Buckets and EC2 instances for real-time data acquisition applications.
Directed the design of a high-performance data analytics platform on AWS, leveraging EMR, S3,
Spark, and Airflow to manage data.
Worked on huge datasets stored in AWS S3 buckets, used spark data frames to perform
preprocessing in Glue.
Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3,
ORC/Parquet/Text Files into AWS Redshift.
Loaded Transformed data to AWS Redshift using Spark Batch Processing.
Integrate with AWS services like Amazon CloudWatch for monitoring and alerting.
Worked on Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
Orchestration of the ingestion pipeline using CloudWatch and Lambda for time-based triggers.
Implemented best practices for ETL processes with Apache Spark to transform raw data into
user-friendly dimensional data for self-service reporting.
Designed and implemented robust data pipelines on the Databricks platform, leveraging Spark
SQL and Spark Streaming for real-time data processing.
Created and managed tables and views to facilitate data migration, querying, and reporting.
Data Ingestion Management System is an In-house developed system designed to ingest and
transform the data.
Technology used: AWS, S3, EMR, S3, RDS, CloudWatch , Redshift, Lambda , Tableau, Juniper,
Control-M, JIRA, Confluence, Putty, WinSCP
Citi Bank, USA - Data Engineer
July-2018 - Dec-2019
Created Azure Data Factory pipeline, resource group, and activities for data migration.
It picks up the data from the Event hub, transformED and sends it to the target system.
Prepared Data-set, Answer-set based on business rules and visualize it.
Designed and implemented data processing workflows using Azure Databricks, leveraging Spark
for large-scale data transformations.
Transformed data using Azure Synapse, designed schemas, facts, and dimensions.
Developed data extraction pipelines to extract data from various sources, such as on-premises
databases, APIs, or cloud-based applications.
Utilized best practices for data security and encryption during transit.
Implemented mechanisms for identifying data changes or updates in the source systems using
techniques like change tracking, timestamps, or incremental markers.
Designed and implemented the delta load process to transfer only the changed or new data from
source to destination, reducing processing time and network bandwidth consumption.
Developed incrementally processed ETL pipelines to handle real-time data updates in migration.
Optimized the full delta load process to achieve better performance, reduced latency, and
minimized resource consumption, thereby maximizing system efficiency.
Leveraged Change Data Capture techniques to propagate changes effectively during migration.
Prepared reports in Tableau that are involved in Business according to the requirements.
Technology used: Azure Data Factory, Azure Synapse, ADLS, PySpark, Tableau
CitiusTech Healthcare - Solution Architect
Feb-2018 - June-2018
Understanding the data of different Healthcare source systems and designing the data ingestion,
transformation, and reconciliation process accordingly.
Prepared functional and technical design documents of different H-Scale components i.e. Data
Quality, Data Transformation, and Reconciliation.
Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load
(ETL) data from diverse sources into MS SQL DB.
Improved database performance through query tuning, indexing strategies, and partitioning
techniques.
Collaborated across teams to integrate big data technologies such as Hadoop, Hive, and Kafka
into existing infrastructures.
Developed and maintained data pipelines using Sqoop, Flume, and Kafka to ingest, transform,
and process data for analysis.
Performed data aggregation and analysis on large-scale datasets using Apache Spark, Scala,
and Hive, resulting in improved insights for the business.
Developed data processing workflows leveraging Spark for distributed processing and
transformations.
Developed Spark jobs to transform data and apply business transformation rules to load/process
data across the enterprise and application-specific layers.
Developed and optimized Spark jobs for data transformations and aggregations.
Technology used: PySpark, HIVE, Sqoop, Oozie
State Government, India - Hadoop Developer
Nov-2015 - Jan-2018
Involved in preparing end-to-end solution design of Big Data projects.
Involved in requirement gathering, data analysis, design, planning, and preparing mapping
documents.
Created Web Application ( Digital Library ) in Java/J2EE, Solr on top of Hadoop to easy access
and content-based search to documents, images, audio, and video files of various departments.
Created a Web Application ( Face Recognition ) in Java/J2EE using Python, Hbase, and
Phoenix, which matches the input face image with millions of images residing in HBase.
Created a Web Application ( Citizen-360 ) application to find out citizens activity in each
government department.
Solr Implementation in Digital Library application for content-based search in documents.
Utilized big data ecosystems such as Hadoop, Spark, and Cloudera to load and transform large
sets of structured, semi-structured, and unstructured data.
Utilized Hive queries and Spark SQL to analyze and process data, meeting specific business
requirements and simulating MapReduce functionalities.
Prepared an ETL framework using Sqoop, Pig, and Hive to bring in data from various sources
and make it available for consumption.
Developed MapReduce programs for unstructured data video, images and blog data and
structured data using Pig and Hive for analysis.
Log analysis of E-Mitra and Bhamashah web applications using Kafka to find out the problem
area of the application.
Created various use cases based on e-Governance projects data.
Developed data ingestion, process, post ingestion process for various sources into Data Lake for
CTD, VATMAN, EXCISE, and HCD.
Loading the data from the Oracle and SQL Server into Hive tables using Sqoop.
Engaged in data cleaning and analysis using Hive.
Fraud detection analysis through multiple database transactions using SparkSQL.
Sentiment analysis based on grievance feedback to find out positive and negative sentiment
against each problem statement.
Technology used: HDFS, Hive, Sqoop, Tika, Solr, Spark, HDP2.4, Tableau, Core Java, J2EE, Linux,
Teradata Aster, Teradata App Center
Nationwide Building Society, UK - Hadoop Developer
Jan-2012 - Oct-2015
Created Hive Internal or External tables as per the requirements and defined with appropriate
partitions, intended for efficiency.
Developing, installing, and configuring Hadoop ecosystem components that moved data from
individual servers to Hadoop Cluster.
Writing ETL scripts using Sqoop to transfer required data from Hadoop to the database.
Utilized Hive queries and Spark SQL to analyze and process data, meeting specific business
requirements and simulating MapReduce functionalities.
Migrated data from Oracle to Hadoop using Sqoop for processing, enhancing data
management and processing capabilities.
Installed and configured multi-node Hadoop cluster for data store and processing
Imported and exported data into HDFS, HBase and Hive using Sqoop.
Engaged in requirement gathering, analysis, designed architecture for end-to-end data flow.
Engaged in Hadoop cluster setup using Cloudera distribution.
Created workflows using Oozie for data injection from various systems to Hadoop.
Created various reports for the client using tableau, which they used to offer loans and cards to
their customers.
Developed complex calculated fields for the business logic, field actions, sets and parameters to
include various filtering capabilities for the dashboards, and to provide drill down features for the
detailed reports.
Technology used: HDFS, MapReduce, Hive, Sqoop, PIG, Oozie, Tableau, CDH 5.4, Core Java, UNIX
Matson, USA - Java Developer
Aug-2010 - Dec-2012
Developed several modules for the project.
Involved in unit and system integration testing, addressing technical challenges, and providing
the technical solution.
Implemented Struts 1.2 MVC architecture with Spring dependency injection and AOP.
Worked on various types of validations of authentications.
Worked in Spring and Hibernate Integration at DAO layer.
Involved in the design and coding in DAO classes, and Restful services design.
Involved in team handling and client interaction.
Developed one complete module, Customer Profile , of the project.
Technology used: Core Java, J2EE, Spring, Struts, Hibernate, HTML, CSS, jQuery, Oracle
Videocon Telecommunication - Java Developer
Feb-2007 - Jul-2010
Analyzed the business requirements to understand the application.
Design complete application flow according to business requirement documents.
Created High-Level Design document. Database and web page design.
Involved in coding, unit testing, system testing, and enhancement of the project.
Development of Server-side coding using hibernate DAO.
Responsible for unit testing and bug fixing.
Engaged in database and web-page design.
Engaged in regular client interaction for feedback and enhancements.
Technology used: Core Java, J2EE, Struts, Hibernate, HTML, CSS, jQuery, IBM RSA and DB2
Educational Qualification
Master of Computer Applications, University of Rajasthan, Jaipur, Rajasthan
Bachelor of Science, University of Rajasthan, Jaipur, Rajasthan
Certifications
Microsoft Certified Azure Data Engineer Associate
Sun Certified Java Programmer (SCJP 1.5)
|