Quantcast

Data Engineer Resume Dallas, TX
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Data Engineer
Target Location US-TX-Dallas
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Data Engineer Big Denton, TX

Real Estate Data Engineer Dallas, TX

Data Engineer Senior Plano, TX

Data engineer Plano, TX

Data Engineer Senior Denton, TX

Android Engineer Data Science Irving, TX

Data Engineer McKinney, TX

Click here or scroll down to respond to this candidate
PRAVALLIKA
Ph: PHONE NUMBER AVAILABLE                                                  Email: EMAIL AVAILABLE
                                                                  LinkedIn: LINKEDIN LINK AVAILABLE
PROFESSIONAL SUMMARY:
  Over 5 years of experience in Data Engineering, Data Pipeline Design, Development, and Implementation as a Data
   Engineer/Data Developer and Data Modeler.
  Ingested and transformed data from multiple sources, landing it into AWS S3, HDFS, and Snowflake.
  Constructed ETL jobs to migrate and process data across various platforms, including Teradata, MongoDB, SQL
   Server, and Presto.
  Developed data processing applications using Scala and Apache Spark for efficient data transformation and analysis.
  Developed data processing triggers using AWS Lambda functions with Python and utilized Boto3 for AWS resource
   management.
  Good working with bid data on AWS cloud services -EC2, S3, EMR, DynamoDB, and Redshift and Apache spark.
  Implemented Spark jobs for data preprocessing and analytics, ensuring efficient runtime and resource usage.
  Utilized Python for implementing Machine learning algorithms like Generalized Linear Models, Random Forest, and
   Gradient Boosting.
  Experienced in Python Data manipulation for loading and extraction as well as with Python libraries such as NumPy,
   SciPy and Pandas for data analysis and numerical computations.
  Experienced in Data transformation, Data Mapping, from source to target database schema, and Data cleansing
   Procedures.
  Analyzed data from Kafka sources to build predictive models and analysis frameworks.
  Developed comprehensive reports and dashboards using various reporting tools including Power BI, Tableau, and
   other BI platforms to deliver actionable insights.
  Developed Power BI reports and dashboards using various data sources, employing complex DAX calculations.
  Implemented data governance frameworks to ensure data quality, consistency, and security across the organization.
  Participated in code reviews and documented application functionality to maintain high standards of compliance
   and data integrity.
  Created detailed documentation for data analysis processes, ETL workflows, and system configurations.
  Utilized GitHub and Docker for CI/CD systems to build, test, and deploy applications.
  Employed Apache Airflow to automate data workflows and AWS Step Functions for orchestration.
  Created ETL jobs using PySpark for data migrations and loads into HDFS and Hive.
  Used Sqoop for data import from RDBMS to HDFS, converting data formats and optimizing storage.
  Maintained comprehensive documentation of data analysis processes, ETL workflows, and system configurations.
  Actively participated in Agile/Scrum methodologies, contributing to iterative improvements and successful project
   delivery.

  Collaborated effectively with cross-functional teams to define KPIs, analyze data trends, and present insights to
   stakeholders, fostering a collaborative work environment.
  Leveraged creative problem-solving skills to develop an automated data validation tool, reducing manual effort by
   50% and improving data accuracy.



TECHNICAL SKILLS SET
  Big Data Technologies                 Spark, Hive, Hadoop, HDFS, MapReduce, Kafka, Snowflake, Presto, MongoDB.
  Cloud Platforms                       AWS (Lambda, S3, EC2, Redshift, EMR, Glue), Databricks.
  ETL and Data Integration              Apache Airflow, Apache Sqoop, AWS Step Functions, Python, PySpark.
  Databases                             PostgreSQL, Oracle (11g, 12c), SQL Server, Teradata, DB2.
 Programming Languages                  Python (3.7), SQL, PL/SQL, Unix shell scripting, Java,Scala.
 Data Analysis and Visualization        Power BI, Google Analytics, Tableau, Jupyter Notebooks, DAX.
 CI/CD and Version Control              GitHub, Docker, SVN.
 Collaboration Tools                    Jira, Slack, Confluence, SharePoint, Microsoft
 Operating Systems                      Windows, Linux.
WORK EXPERIENCE:

Walmart Pharmacy, Bentonville, AR                                                                  May 23   Till Date
Data Engineer
Roles & Responsibilities:
      Worked on Big Data Integration and Analytics based on Spark, Hive, PostgreSQL, Snowflake, and MongoDB.
      Ingested data into a data lake from different sources and performed various transformations like sort, join,
       aggregations, and filter to process various datasets.
      Constructed data pipelines for pulling data from SQL Server and Hive. Landed the data in AWS S3 and loaded it
       into Snowflake after transforming.
      Work with structured/semi-structured data ingestion and processing on AWS using S3 Python. Migrate on-
       Premises big data workloads to AWS.
      Built ETLs to load data from Presto, PostgreSQL, Hive, SQL Server to Snowflake using Apache Airflow, Python,
       and Spark.
      Created ETL jobs using Spark to perform data migrations and data loads into HDFS and Hive from different
       source systems.
      Reengineered existing ETL workflows to improve performance by identifying bottlenecks and optimizing code
       accordingly.
      Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS system
       to the Data Lake Utilizing Spark Data sources and Hive data Objects
      Optimized data pipelines by implementing advanced ETL processes and streamlining data flow.
      Migrated data from a legacy Teradata system to MongoDB and built ETLs to load the data into MongoDB.
      Automated data flow between software systems using Apache Airflow.
      Implemented Spark jobs for data preprocessing, validation, normalization, and transmission.
      Written Spark code using Python and Scala as the primary programming languages to perform critical
       calculations.
      Written Spark code using Python as the primary programming language to perform critical calculations.
      Used Apache Airflow with Python and Unix to submit Spark batch jobs in the EMR Cluster.
      Developed data processing triggers for Amazon S3 using AWS Lambda functions with Python.
      Created Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost of
       EC2 resources.
      Created and provisioned multiple Databricks clusters needed for batch and continuous streaming data
       processing and installed the required libraries for the clusters.
      Utilized GitHub and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
      Utilized Python to implement different Machine learning algorithms, including Generalized Linear Model,
       Random Forest, and Gradient Boosting.
      Consumed data from Kafka sources and implemented an analysis model.
      Created Power BI reports and dashboards as per the business requirement using different Data Sources.
      Extensively used DAX to create complex calculated measures and columns in Power BI and Cubes.
      Participated in code reviews and demonstrated application functionality and configurations to the stakeholders.
      Implemented data governance frameworks to ensure data quality, consistency, and security by developing
       standardized data definitions, implementing validation checks, and ensuring compliance with data policies.
      Collaborated with cross-functional teams to define requirements and develop end-to-end solutions for complex
       data engineering projects.
Environment: AWS (Lambda, S3, EC2, Redshift, EMR), Redshift, Teradata 15, MongoDB, PostgreSQL, SQL, Oracle 12c,
PySpark, Hadoop, Hive, HDFS, Kafka, Airflow, Snowflake, Python 3.7, PyCharm, Jupyter Notebooks, Power BI, GitHub,
Docker, Git, Toad, Unix, Jira, Slack, Confluence, Agile/Scrum, XML.

Duke Energy, Charlotte, NC                                                                    May 2020-Nov2022
Data Engineer
Roles & Responsibilities:
  Participated in requirements gathering and collaborated closely with architects and SMEs in designing and modeling
    data solutions.
  Handled data ingestions from various data sources, performed transformations using Spark, and loaded data into
    Amazon S3 and HDFS.
  Converted Hive/SQL queries into Spark transformations/actions using PySpark.
  Moved flat files generated from various feeds to Amazon S3 for further processing.
  Developed ETL tools to load data from source to target using Python, PySpark, Sqoop, Unix, and Amazon Redshift.
  Created Sqoop incremental imports, landed data in Parquet format in HDFS, and transformed it to ORC format using
    PySpark.
  Created reusable utilities and programs in Python to perform repetitive tasks such as sending emails and comparing
    data.

  Developed and maintained PL/SQL procedures to load data sent in XML files into Oracle tables.
  Used Sqoop to import data from RDBMS to Hadoop Distributed File System (HDFS) and analyzed the imported data
    using Hive.
  Created UNIX shell scripts to load data from flat files into Oracle tables.
  Developed Sqoop scripts to ingest data from Oracle, Teradata, and DB2 into HDFS and Hive.
  Developed Python and Hive scripts for creating reports from Hive data.
  Managed and monitored AWS resources such as EC2, S3, Redshift, and EMR clusters.
  Implemented data pipeline orchestration using AWS Glue and AWS Lambda functions.
  Developed MapReduce/Spark Python modules for machine Learning & Predictive analytics in Hadoop on AWS.
    Implemented a Python-based distributed random forest via Python Streaming.
  Automated workflows and data processing using Apache Airflow and AWS Step Functions.
  Utilized Cloudera Impala for efficient querying and analysis of large datasets, optimizing performance and resource
    usage.
  Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and
    reporting.

Environment: AWS (EC2, S3, Redshift, EMR, Glue, Lambda), Hadoop, Hive, Spark, HDFS, MapReduce, Python, PyCharm,
PostgreSQL, Oracle 11g, SQL, PL/SQL, TOAD, Unix, SharePoint, Teradata, DB2, SVN, Java, Eclipse.


Environment: Hadoop, Hive, Spark, HDFS, MapReduce, Python, PyCharm, PostgreSQL, Oracle 11g, SQL, PL/SQL TOAD,
Unix, SharePoint, Teradata, DB2, SVN, Java, Eclipse

IQVIA, India                                                                          May 2019 - April 2020
Data Analyst
Roles & Responsibilities:
Analyzed large datasets to identify trends, patterns, and insights that informed business decisions.
    Utilized SQL to query databases and retrieve necessary information for analysis.
    Created custom reports and dashboards in Google Analytics to visualize key metrics and trends, providing marketing
    and business teams with actionable insights.
    Utilized Python for basic data analysis tasks, including data exploration, statistical analysis, and summarizing key
    findings.
    Maintained documentation of data analysis processes, findings, and methodologies for future reference.
    Collaborated with cross-functional teams to define key performance indicators (KPIs) and establish tracking
    mechanisms in Google Analytics.
    Gained valuable experience working within a specific industry, applying learned concepts directly into relevant work
    situations.
    Provided ad-hoc analysis support for various business needs and projects.

Environment: Python, SQL, Google Analytics, Tableau, Microsoft Excel, Jupyter Notebooks, Windows, Linux, Microsoft
Teams, Slack

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise