| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
SENIOR DATA ENGINEER
Mobile no: PHONE NUMBER AVAILABLE
Email ID: EMAIL AVAILABLE
PROFESSIONAL SUMMARY
Around 5 years IT experience in Analysis, Design, Development and Big Data in Scala, Pyspark,
Hadoop, and HDFS environment and experience in Python.
Implemented Big Data solutions using Hadoop technology stack, including Pyspark, Hive, Sqoop.
Developed complex SQL queries with various relational databases like Oracle, SQL Server for
Support of Data Warehousing and Data Integration Solutions.
Firm understanding of Hadoop architecture and various components including HDFS, Job
Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
Involved in setting up Jenkins Master and multiple slaves for the entire team as a CI tool as part
of Continuous development and deployment process.
Installed and configured apache Airflow for workflow management and created workflows in
python, created the DAG s using Airflow to run jobs sequentially and parallelly.
Experienced in Optimizing the Pyspark jobs to run on Kubernetes Cluster for faster data
processing.
Involved in converting Hive Queries into various Spark Actions and Transformations by Creating
RDD and Data frame from the required files in HDFS.
Experience in providing support to data analyst in running Hive queries and building an ETL.
Performed Importing and exporting data into HDFS and Hive using Sqoop.
Experienced in Designing, Architecting, and implementing scalable cloud-based web applications
using AWS and Azure.
Involved in Software development, Data warehousing and Analytics and Data engineering
projects using Hadoop, MapReduce, Hive, and other open-source tools/technologies.
Worked on reading and writing multiple data formats like Parquet on HDFS using PySpark.
Experienced in requirement analysis, application development, application migration and
maintenance using Software Development Lifecycle (SDLC) and Python technologies.
Defined user stories and driving the agile board in JIRA during project execution, participate in
sprint demo and retrospective.
Strong working experience with SQL and NoSQL databases, data modeling and data pipelines.
Involved in end-to-end development and automation of ETL pipelines using SQL and Python.
TECHNICAL SKILLS
HDFS, MapReduce, Hive, Sqoop, HBase, KafkaConnect, Spark,
Big Data Eco System
Zookeeper, Amazon Web Services, Airflow.
Hadoop Distributions Apache Hadoop 2.x, Cloudera CDP
Programming Languages Python, Java, Shell Scripting
Databases MySQL, MS SQL SERVER, HBase
Version control GIT, Bitbucket.
Cloud Technologies Amazon Web Services, EC2, S3, Azure DataBricks, Snowflake.
WORK EXPERIENCE
Senior Data Engineer | Lowe's Companies Inc |Charlotte, NC| June 2023 - Present
Migrated the complex data jobs from Teradata to hive and developed ETL pipelines to push the
loaded data into Apache Druid.
Developed the airflow connectivity to all the ETL pipelines and migrated all the oozie jobs to
airflow.
Continuously worked with cross-functional development teams (Data Analysts and Software
Engineers) for creating Pyspark jobs using Spark SQL and help them build reports on top of data
pipelines.
Lead a team of Junior data engineers in designing and developing the data transformations and
data management.
Environment: Python, HDFS, Spark, ETL, Hive, Yarn, Jenkins, MySQL, RDBMS,
Airflow,Collibra, Apache Druid, Oozie
Data Engineer | Conch Technologies Inc |Memphis, Tennessee| October 2022 - May 2023
Involved in developing Spark application using PySpark as per business requirement.
Designed robust, reusable, and scalable data driven solutions and data pipeline frameworks to
automate the ingestion, processing and delivery of both structured and unstructured batch and
real time data streaming data using Python Programming.
Hands on experience on developing, Data Frames and optimized SQL queries in Spark SQL.
Worked with building data warehouse structures, and creating facts, dimensions, aggregate
tables, by dimensional modeling, Star and Snowflake schemas.
Developed spark applications in PySpark on distributed environment to load huge number
of CSV files with different schema in to Hive ORC tables.
Migrated data from hive to MySQL, to be displayed on UI by using PySpark job which runs for
different environments.
Applied transformation on the data loaded into Spark Data Frames and done in memory data
computation to generate the output response.
Forecasted the future trends on ATM cash transactions and cheque count by performing Time
series analysis on history data using Seasonal ARIMA model.
Experience working with SparkSQL and creating RDD's using PySpark. Extensive
experience working with ETL of large datasets using PySpark in Spark on HDFS
Environment: Python, HDFS, Spark, ETL, Hive, Yarn, HBase, Jenkins, MySQL,
RDBMS, Airflow,Collibra, Seasonal ARIMA, Time Series Analysis.
Data Engineer | Development Bank of Singapore (DBS Bank) |Hyderabad, Telangana | July 2018
- February 2022
Responsible for design and development of Spark SQL Scripts based on functional
Specifications. Created HBase tables to store various data formats of data coming from spark.
Hands in experience in working with Continuous Integration and Deployment (CI/CD) using
Jenkins.
Developed ETL pipelines in and out data warehouse using combination of Python and SparkSQL.
Importing and exporting data into HDFS and Hive using Sqoop from Oracle.
Scheduled the spark jobs using Airflow scheduler and monitored their performance.
Used Teradata for developing and running the history migration scripts on millions of data.
Worked on collibra platform for metadata creation for thousands of tables.
Worked with data stewards to create Data Quality rules and Data Quality checks in Collibra.
Responsible for the design, implementation, and architecture of very large-scale data
intelligence solutions around big data platforms.
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data
into the HDFS system.
Importing and exporting data into HDFS and Hive using Sqoop from Oracle.
Developed data processing and data manipulation tasks using PySpark and load data in to target
data destinations.
Worked on writing high quality documentation describing ETL routines, data mapping, and other
artifacts needed to design data migration routines.
Environment: Python, HDFS, Spark, ETL, Sqoop, Collibra, Airflow, Hbase, CI/CD
Big Data Engineer | Que Technologies |Hyderabad, India| April 2017 - February 2018
Responsible for the design, implementation, and architecture of very large-scale data
intelligence solutions around big data platforms.
Worked on SQL queries in dimensional data warehouses and relational data warehouses.
Performed Data Analysis and Data Profiling using Complex SQL queries on various systems.
Troubleshoot and resolve data processing issues and proactively engaged in data modelling
discussions.
Worked on RDD Architecture and implementing spark operations on RDD and optimizing
transformations and actions in Spark.
Using Azure Data Factory, created data pipelines and data flows and triggered the pipelines.
Written programs in spark using Python, PySpark and Pandas packages for performance tuning,
optimization, and data quality validations.
Worked on developing Kafka Producers and Kafka Consumers for streaming millions of events
per second on streaming data.
Worked on Tableau to build customize interactive reports, worksheets, and dashboards.
Developed Spark Programs using Scala and Java API's and performed transformations and
actions on RDD's.
Designed the ETL process from various sources into Hadoop/HDFS for analysis and further
processing of data modules.
Worked on object detection using Python Open CV to capture the images of moving objects.
Environment: HDFS, Python, SQL, MapReduce, Spark, Kafka, Hive, Yarn, Zookeeper, Shell Scripting,
RDBMS, ETL, PySpark, Hadoop.
Software Engineer | University of Hyderabad| Hyderabad, India |April 2016 December 2016
Used GIT to maintain repository, creating and merging branches, commit changes, checking out,
moving, and removing files.
Created data models, stored procedures, queries for data analysis and manipulations, views,
functions. Maintain, upgrade databases and creating backups in SQL.
The data received after all the tests were done would be parsed to see that there are no
inconsistencies and save the data to the database.
Involved in importing and exporting data from local and external file system and RDBMS to
HDFS.
Worked on mobile crowd sensing by performing simulation using CrowdSenSim simulator.
Created graphs and charts for displaying the simulated data using Tableau.
Environment: GIT, MySQL, RDBMS, Shell Script, JIRA, Tableau.
EDUCATION: 2018
Bachelor of Engineering in Computer Science GPA: 8.89/10
(Jawaharlal Nehru Technological University Kakinada)
|