Data Engineer Resume Dallas, TX

Data Engineer Resume Dallas, TX
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-TX-Dallas
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Data Engineer Big Denton, TX

Real Estate Data Engineer Dallas, TX

Data Engineer Senior Plano, TX

Data engineer Plano, TX

Data Engineer Senior Denton, TX

Android Engineer Data Science Irving, TX

Data Engineer McKinney, TX

Click here or scroll down to respond to this candidate

PRAVALLIKA
Ph: PHONE NUMBER AVAILABLE Email: EMAIL AVAILABLE
LinkedIn: LINKEDIN LINK AVAILABLE
PROFESSIONAL SUMMARY:
Over 5 years of experience in Data Engineering, Data Pipeline Design, Development, and Implementation as a Data
Engineer/Data Developer and Data Modeler.
Ingested and transformed data from multiple sources, landing it into AWS S3, HDFS, and Snowflake.
Constructed ETL jobs to migrate and process data across various platforms, including Teradata, MongoDB, SQL
Server, and Presto.
Developed data processing applications using Scala and Apache Spark for efficient data transformation and analysis.
Developed data processing triggers using AWS Lambda functions with Python and utilized Boto3 for AWS resource
management.
Good working with bid data on AWS cloud services -EC2, S3, EMR, DynamoDB, and Redshift and Apache spark.
Implemented Spark jobs for data preprocessing and analytics, ensuring efficient runtime and resource usage.
Utilized Python for implementing Machine learning algorithms like Generalized Linear Models, Random Forest, and
Gradient Boosting.
Experienced in Python Data manipulation for loading and extraction as well as with Python libraries such as NumPy,
SciPy and Pandas for data analysis and numerical computations.
Experienced in Data transformation, Data Mapping, from source to target database schema, and Data cleansing
Procedures.
Analyzed data from Kafka sources to build predictive models and analysis frameworks.
Developed comprehensive reports and dashboards using various reporting tools including Power BI, Tableau, and
other BI platforms to deliver actionable insights.
Developed Power BI reports and dashboards using various data sources, employing complex DAX calculations.
Implemented data governance frameworks to ensure data quality, consistency, and security across the organization.
Participated in code reviews and documented application functionality to maintain high standards of compliance
and data integrity.
Created detailed documentation for data analysis processes, ETL workflows, and system configurations.
Utilized GitHub and Docker for CI/CD systems to build, test, and deploy applications.
Employed Apache Airflow to automate data workflows and AWS Step Functions for orchestration.
Created ETL jobs using PySpark for data migrations and loads into HDFS and Hive.
Used Sqoop for data import from RDBMS to HDFS, converting data formats and optimizing storage.
Maintained comprehensive documentation of data analysis processes, ETL workflows, and system configurations.
Actively participated in Agile/Scrum methodologies, contributing to iterative improvements and successful project
delivery.

Collaborated effectively with cross-functional teams to define KPIs, analyze data trends, and present insights to
stakeholders, fostering a collaborative work environment.
Leveraged creative problem-solving skills to develop an automated data validation tool, reducing manual effort by
50% and improving data accuracy.

TECHNICAL SKILLS SET
Big Data Technologies Spark, Hive, Hadoop, HDFS, MapReduce, Kafka, Snowflake, Presto, MongoDB.
Cloud Platforms AWS (Lambda, S3, EC2, Redshift, EMR, Glue), Databricks.
ETL and Data Integration Apache Airflow, Apache Sqoop, AWS Step Functions, Python, PySpark.
Databases PostgreSQL, Oracle (11g, 12c), SQL Server, Teradata, DB2.
Programming Languages Python (3.7), SQL, PL/SQL, Unix shell scripting, Java,Scala.
Data Analysis and Visualization Power BI, Google Analytics, Tableau, Jupyter Notebooks, DAX.
CI/CD and Version Control GitHub, Docker, SVN.
Collaboration Tools Jira, Slack, Confluence, SharePoint, Microsoft
Operating Systems Windows, Linux.
WORK EXPERIENCE:

Walmart Pharmacy, Bentonville, AR May 23 Till Date
Data Engineer
Roles & Responsibilities:
Worked on Big Data Integration and Analytics based on Spark, Hive, PostgreSQL, Snowflake, and MongoDB.
Ingested data into a data lake from different sources and performed various transformations like sort, join,
aggregations, and filter to process various datasets.
Constructed data pipelines for pulling data from SQL Server and Hive. Landed the data in AWS S3 and loaded it
into Snowflake after transforming.
Work with structured/semi-structured data ingestion and processing on AWS using S3 Python. Migrate on-
Premises big data workloads to AWS.
Built ETLs to load data from Presto, PostgreSQL, Hive, SQL Server to Snowflake using Apache Airflow, Python,
and Spark.
Created ETL jobs using Spark to perform data migrations and data loads into HDFS and Hive from different
source systems.
Reengineered existing ETL workflows to improve performance by identifying bottlenecks and optimizing code
accordingly.
Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS system
to the Data Lake Utilizing Spark Data sources and Hive data Objects
Optimized data pipelines by implementing advanced ETL processes and streamlining data flow.
Migrated data from a legacy Teradata system to MongoDB and built ETLs to load the data into MongoDB.
Automated data flow between software systems using Apache Airflow.
Implemented Spark jobs for data preprocessing, validation, normalization, and transmission.
Written Spark code using Python and Scala as the primary programming languages to perform critical
calculations.
Written Spark code using Python as the primary programming language to perform critical calculations.
Used Apache Airflow with Python and Unix to submit Spark batch jobs in the EMR Cluster.
Developed data processing triggers for Amazon S3 using AWS Lambda functions with Python.
Created Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost of
EC2 resources.
Created and provisioned multiple Databricks clusters needed for batch and continuous streaming data
processing and installed the required libraries for the clusters.
Utilized GitHub and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
Utilized Python to implement different Machine learning algorithms, including Generalized Linear Model,
Random Forest, and Gradient Boosting.
Consumed data from Kafka sources and implemented an analysis model.
Created Power BI reports and dashboards as per the business requirement using different Data Sources.
Extensively used DAX to create complex calculated measures and columns in Power BI and Cubes.
Participated in code reviews and demonstrated application functionality and configurations to the stakeholders.
Implemented data governance frameworks to ensure data quality, consistency, and security by developing
standardized data definitions, implementing validation checks, and ensuring compliance with data policies.
Collaborated with cross-functional teams to define requirements and develop end-to-end solutions for complex
data engineering projects.
Environment: AWS (Lambda, S3, EC2, Redshift, EMR), Redshift, Teradata 15, MongoDB, PostgreSQL, SQL, Oracle 12c,
PySpark, Hadoop, Hive, HDFS, Kafka, Airflow, Snowflake, Python 3.7, PyCharm, Jupyter Notebooks, Power BI, GitHub,
Docker, Git, Toad, Unix, Jira, Slack, Confluence, Agile/Scrum, XML.

Duke Energy, Charlotte, NC May 2020-Nov2022
Data Engineer
Roles & Responsibilities:
Participated in requirements gathering and collaborated closely with architects and SMEs in designing and modeling
data solutions.
Handled data ingestions from various data sources, performed transformations using Spark, and loaded data into
Amazon S3 and HDFS.
Converted Hive/SQL queries into Spark transformations/actions using PySpark.
Moved flat files generated from various feeds to Amazon S3 for further processing.
Developed ETL tools to load data from source to target using Python, PySpark, Sqoop, Unix, and Amazon Redshift.
Created Sqoop incremental imports, landed data in Parquet format in HDFS, and transformed it to ORC format using
PySpark.
Created reusable utilities and programs in Python to perform repetitive tasks such as sending emails and comparing
data.

Developed and maintained PL/SQL procedures to load data sent in XML files into Oracle tables.
Used Sqoop to import data from RDBMS to Hadoop Distributed File System (HDFS) and analyzed the imported data
using Hive.
Created UNIX shell scripts to load data from flat files into Oracle tables.
Developed Sqoop scripts to ingest data from Oracle, Teradata, and DB2 into HDFS and Hive.
Developed Python and Hive scripts for creating reports from Hive data.
Managed and monitored AWS resources such as EC2, S3, Redshift, and EMR clusters.
Implemented data pipeline orchestration using AWS Glue and AWS Lambda functions.
Developed MapReduce/Spark Python modules for machine Learning & Predictive analytics in Hadoop on AWS.
Implemented a Python-based distributed random forest via Python Streaming.
Automated workflows and data processing using Apache Airflow and AWS Step Functions.
Utilized Cloudera Impala for efficient querying and analysis of large datasets, optimizing performance and resource
usage.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and
reporting.

Environment: AWS (EC2, S3, Redshift, EMR, Glue, Lambda), Hadoop, Hive, Spark, HDFS, MapReduce, Python, PyCharm,
PostgreSQL, Oracle 11g, SQL, PL/SQL, TOAD, Unix, SharePoint, Teradata, DB2, SVN, Java, Eclipse.

Environment: Hadoop, Hive, Spark, HDFS, MapReduce, Python, PyCharm, PostgreSQL, Oracle 11g, SQL, PL/SQL TOAD,
Unix, SharePoint, Teradata, DB2, SVN, Java, Eclipse

IQVIA, India May 2019 - April 2020
Data Analyst
Roles & Responsibilities:
Analyzed large datasets to identify trends, patterns, and insights that informed business decisions.
Utilized SQL to query databases and retrieve necessary information for analysis.
Created custom reports and dashboards in Google Analytics to visualize key metrics and trends, providing marketing
and business teams with actionable insights.
Utilized Python for basic data analysis tasks, including data exploration, statistical analysis, and summarizing key
findings.
Maintained documentation of data analysis processes, findings, and methodologies for future reference.
Collaborated with cross-functional teams to define key performance indicators (KPIs) and establish tracking
mechanisms in Google Analytics.
Gained valuable experience working within a specific industry, applying learned concepts directly into relevant work
situations.
Provided ad-hoc analysis support for various business needs and projects.

Environment: Python, SQL, Google Analytics, Tableau, Microsoft Excel, Jupyter Notebooks, Windows, Linux, Microsoft
Teams, Slack

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: