Data Engineer Resume Overland park, KS

Data Engineer Resume Overland park, KS
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-KS-Overland Park
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Sr data engineer Kansas City, MO
Big data engineer Kansas City, MO
Data Engineer Azure Lees Summit, MO
Data Engineer Overland Park, KS
Data Engineer Senior Overland Park, KS
Data Engineer Machine Learning Kansas City, MO
Data Analyst Engineer Overland Park, KS
Click here or scroll down to respond to this candidate
            Candidate's Name | DATA ENGINEER
                       LinkedIn | EMAIL AVAILABLE | PHONE NUMBER AVAILABLE

PROFESSIONAL SUMMARY
      Proficient in analyzing requirements, designing, developing, testing, and managing complete data
      management and processing systems, including process documentation and ETL technical and design
      documents.
      Orchestrated the development and execution of ETL pipelines using Databricks for efficient data
      ingestion, transformation, and loading, resulting in a scalable and maintainable data ecosystem.
      Experienced in Aggregating data through Kafka, HDFS, Hive, Scala, and Spark Streams in Azure.
      Well versed with Big Data on Azure cloud services (Virtual Machines, HDInsight, Blob Storage, Data
      Factory, Cosmos DB, and Synapse Analytics).
      Engineered a dynamic GraphQL schema generation process that adapted to changes in the underlying
      graph ontology model, ensuring agility in development and deployment cycles.
      Developed, deployed, and managed Azure Functions for event-driven and scheduled tasks, responding
      to various Azure events including logging, monitoring, and security, and for scheduled backups.
      Utilized advanced features of GraphQL subscriptions to implement real-time data updates across web
      and mobile applications, enhancing user engagement and application value.
      Familiar with integrating Spark Streaming with Kafka for real-time data processing and practical
      knowledge of DBT on Azure cloud.
      Leveraged Databricks Delta Lake's CDC (Change Data Capture) capabilities to capture and process
      incremental data changes from streaming and transactional sources, ensuring up-to-date analytics
      insights for real-time decision-making.
      Experienced in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance,
      Metadata Management, Master Data Management (MDM), and Configuration Management.
      Utilized Spark Streaming APIs for on-the-fly transformations to create a common data model, receiving
      real-time data from confluent Kafka and storing it in Snowflake.
      Designed effective data schemas and storage models in Cassandra to optimize data access patterns
      and support complex queries, significantly improving application responsiveness and user satisfaction.
      Refactored and updated Python codebases to be compatible with the latest Python versions & cloud
      environments, incorporating modern libraries and frameworks to enhance functionality & security.
      Proficient in SQL across several dialects (MySQL, PostgreSQL, SQL Server, and Oracle) and experienced
      in working with NoSQL databases like HBase, Cosmos DB.
      Orchestrated end-to-end ETL workflows using Databricks on Azure, leveraging Spark to efficiently
      process large datasets, enabling scalable data preparation for analytics and business intelligence
      applications.
      Hands-on experience with Spark and Scala APIs to compare Spark's performance with Hive and SQL
      and used Spark SQL for manipulating Data Frames in Scala.
      Innovated in the realm of dimensional data modeling, designing star and snowflake schemas that
      facilitated faster data retrieval and more intuitive data analysis, leading to improved business insights.
      Skilled in implementing and orchestrating data pipelines using Oozie and Airflow.
      Developed custom Looker extensions and applications using Looker's extension framework,
      providing tailored analytics solutions that enhanced operational efficiency by 25%.
      Extensively worked with Teradata utilities like Fast export, and Multi Load for data export and load
      to/from various sources including flat files. Developed MapReduce Programs using Apache Hadoop for
      big data analysis.
      Developed ETL pipelines in and out of data warehouses using a combination of Python and Snowsql.
EDUCATION
       B.Tech(ECE) - 2014 at JNTUA with an 80% GPA

TECHNICAL SKILLS

            Category                                            Skills
                           HDFS, GPFS, HIVE, SQOOP, SNOWFLAKE, SPARK, YARN, PIG, Kafka, DB2,
   Big Data                Confluent Kafka, Databricks, SCD(Type 1, Type 2, Type 3), GraphQL
   Hadoop Distributions    Hortonworks, Cloudera, IBM Big Insights, AWS EMR
   Operating Systems       Windows, Linux (Centos, Ubuntu)
   Programming
   Languages               PYTHON, Spark, Scala, SHELL SCRIPTING, SQL, PySpark
   Databases               Hive, MYSQL, NETEZZA, SQL Server, Redshift, Snowflake, DynamoDB
                           IntelliJ IDEA, ECLIPSE, PYCHARM, GIT, Jenkins, Docker, GitHub, Terraform,
   IDE Tools & Utilities   Airflow
                           Apache Kafka, Confluent Kafka, Spark Structured Streaming, Azure Stream
   Streaming Platforms     Analytics, Apache Flink, Kafka Streams, DBT on Azure
   ETL                     Data stage 9.1/11.5 (Designer/Monitor/Director), AWS Glue, Snowpipe
   Job Scheduler           Control-M, AMBARI, Apache Airflow, AWS Data Pipeline
   Reporting Tools         Tableau, Power BI, Tableau Desktop
                         Azure(Virtual Machines, Blob Storage, HDInsight, Cosmos DB, Data Factory,
                         Functions, Synapse Analytics, Data Factory / Data Catalog, Synapse Analytics),
   Cloud Computing Tools GCP, Snowflake
   File Types              Json, XML, Parquet, text, CSV
   Scrum Methodologies     Agile, Jira
                           MS Office, Service Now, OPTIM, WinSCP, MS Visio, Cloudera Manager, AWS CLI,
   Others                  Databricks Secret Manager, Apache Kafka, Teradata


PROFESSIONAL EXPERIENCE


Maximus, Denver, CO   Remote                                                         Oct 2021 to Present
Role: Azure DATA ENGINEER
RESPONSIBILITIES:
     Developed and optimized ETL workflows using Azure Data Factory, automating data integration and
     transformation processes to support large-scale analytics.
     Orchestrated end-to-end ETL workflows using Databricks on Azure, leveraging Spark to efficiently
     process large datasets, enabling scalable data preparation for analytics and business intelligence
     applications.
     Utilized Databricks Delta Lake to merge, update, and delete operations within data lakes, enabling
     dynamic and efficient data management practices that support evolving data schemas and real-time
     analytics needs.
     Designed and implemented complex data processing pipelines with Apache Airflow, automating ETL
     workflows and improving data reliability and processing time by 30%.
     Configured and maintained scalable and secure data warehouses with Azure Synapse Analytics,
     enhancing data aggregation and query performance.
     Scripted in Python and used Snowflake s SnowSQL for data cleansing and preparation, ensuring high-
     quality data for business intelligence applications.
       Implemented transformations within Databricks notebooks using PySpark and SparkSQL,
       optimizing data structures for analytics and reporting, significantly improving query performance and
       insights accuracy.
       Designed and deployed robust data pipelines with Azure Data Factory and orchestrated complex data
       flows, ensuring timely and accurate data availability.
       Architected a cloud-based data warehousing solution in Snowflake, optimizing data storage and
       computation to handle over 10TB of data with dynamic scaling, significantly improving query
       performance and user access.
       Employed Apache Spark and Databricks for complex data processing and analytics, leveraging Spark
       SQL to handle massive datasets efficiently.
       Optimized Spark configurations within Databricks clusters to tailor resource allocation (executor
       memory, cores) for specific workloads, achieving significant improvements in job performance and cost
       efficiency.
       Integrated Azure Data Factory & Databricks for automated data pipeline orchestration, facilitating
       seamless data ingestion from various sources into Delta Lake, supporting real-time & batch processing
       workflows.
       Developed and maintained semantic data models to support knowledge representation and
       reasoning, enabling advanced analytics and machine learning applications, which enhanced decision-
       making processes.
       Configured Databricks clusters to auto-scale based on workload demands, ensuring cost-effective use
       of Azure resources while maintaining performance for ETL and data transformation tasks.
       Implemented GraphQL APIs for seamless access to graph-based data stores, optimizing query
       performance and enabling flexible, client-specific data fetching that improved application
       responsiveness and user satisfaction.
       Designed and implemented dimensional data models, facilitating efficient OLAP queries and
       enhancing data analysis capabilities, resulting in a 20% increase in query performance.
       Participated in DevOps activities, facilitating continuous integration and delivery (CI/CD) for data
       applications and infrastructure as code (IaC) using Terraform.
       Automated the scaling of Databricks clusters based on workload patterns using Azure functions,
       ensuring that computational resources are dynamically adjusted to workload demands, optimizing cost
       and performance.

TECH STACK: Azure (Data Factory, Synapse Analytics, Functions, Data Lake Store, LookML, Data Catalog),
Databricks, Apache Spark, Python, GraphQL, Snowflake, SQL, SCD, Snowflake, Hadoop, Git, Jira, Shell Scripting,
Spark SQL, Parquet files, Data Cleansing, Data Warehousing, DevOps


Cbre, Dallas, TX   Remote                                                               Apr 2019 - Sep 2021
Role: Sr. BIG DATA ENGINEER
RESPONSIBILITIES:
     Masterminded and optimized ETL jobs using Azure Data Factory, streamlining data integration and
     transformation, which significantly improved data pipeline performance.
     Orchestrated job scheduling and automation of data processes using Azure Logic Apps and Azure
     Functions, enhancing operational efficiency.
     Leveraged Azure Functions for serverless data operations, facilitating efficient data transformation
     tasks.
     Leveraged Databricks Autoloader for efficient and incremental data ingestion from cloud storage
     services, minimizing latency in data availability and ensuring datasets are up-to-date for real-time
     analytics applications.
     Spearheaded the migration of legacy ETL processes to Databricks PySpark, optimizing data
     transformation and loading speeds by 30% and enabling real-time data analytics.
       Directed the use of Apache Airflow (hosted on Azure VMs and AKS for orchestration) for orchestrating
       complex workflows, managing dependencies, and scheduling with precision.
       Led the migration of traditional on-premises data warehouses to Snowflake, significantly enhancing
       data processing speed by leveraging Snowflake s unique multi-cluster, shared data architecture, and
       reducing infrastructure costs by 30%.
       Utilized Databricks SQL Analytics to execute high-performance SQL queries on Delta Lake, providing
       analysts and business users with a familiar SQL interface to explore and visualize big data with minimal
       learning curve.
       Utilized GraphQL schemas to define and enforce data structures and relationships, ensuring data
       integrity and consistency across microservices architectures, and reducing data redundancy.
       Leveraged the Looker IDE to efficiently develop and debug LookML code, incorporating version control
       practices to manage changes and collaboration across the data team.
       Enhanced data security by implementing field-level encryption within Delta Lake using Databricks,
       securing sensitive information while allowing for analytics on encrypted data without compromising
       privacy.
       Constructed interactive visualizations and dashboards with Tableau, turning complex data into
       actionable insights for business stakeholders.
       Working knowledge of Python libraries such as NumPy, SciPy, matplotlib, urllib2, DataFrame,
       Pandas, and Pytables, enhancing data analysis and processing capabilities.
       Expert in using statistical aggregate functions (such as COUNT, SUM, AVG, MIN, MAX, VAR, and
       STDDEV) to derive meaningful insights from data.
       Employed the LookML Validator to catch syntax errors and optimize code efficiency before
       deployment, ensuring high-quality models and a seamless user experience.
       Developed and managed data transformation and migration processes using Azure Data Factory for
       optimized data flow between various sources.
       Utilized SQL Runner for advanced data exploration and query performance testing, identifying
       optimization opportunities that led to a 30% reduction in load times for critical dashboards.
       Utilized Databricks  REST API to manage clusters, jobs, and notebooks, enabling automation of data
       workflows and integration with external applications and services for comprehensive data ecosystem
       management.

TECH STACK: Azure (VMs, Blob Storage, Data Factory, Cosmos DB, LookML, SQL Runner, Airflow, SQL Database,
Synapse Analytics, Functions, Stream Analytics, Logic Apps), Snowflake, SnowSQL, SnowPipe, GraphQL, Python,
SQL, Airflow, Big Data, Data Modeling, Toad, Tableau, MongoDB, UNIX & LINUX, Shell Scripting


Global Atlantic financial group, Indianapolis, IN                                      Feb 2017 to Mar 2019
Role: BIG DATA ENGINEER
RESPONSIBILITIES:
     Crafted real-time data processing systems using Apache Spark Structured Streaming and Kafka,
     integrated within the Azure ecosystem for improved data flow.
     Utilized Azure HDInsight for managing big data processing clusters, employing Scala and PySpark for
     executing complex transformations and aggregating financial data efficiently.
     Developed and deployed robust ETL pipelines using Databricks PySpark, integrating data from
     multiple sources into a cohesive data warehousing solution, enhancing data quality and availability.
     Deployed Spark Streaming APIs to perform real-time transformations, building a common data model
     to persist data from Confluent Kafka to Snowflake.
     Engineered streaming applications with Python and PySpark to transfer data from Kafka topics directly
     into Snowflake, operated within the Azure cloud environment.
       Demonstrated expertise in Data warehousing concepts, effectively managing OLTP and OLAP systems,
       and implementing dimensional models including Facts and Dimensions to support complex business
       intelligence requirements.
       Utilized Airflow's dynamic DAG capabilities to programmatically generate and manage workflows,
       enabling scalable and flexible data processing solutions tailored to business needs.
       Optimized DataStage jobs for performance, focusing on parallel processing, effective memory
       management, and ETL process tuning for large data volumes.
       Designed and implemented data models and transformations using DBT, streamlining data workflows
       and accelerating the delivery of data insights.
       Utilized Azure Data Factory for ETL operations, automating data extraction, transformation, and loading
       processes to enhance data quality and accessibility.
       Proficient in setting up job sequences and schedules using DataStage Scheduler, for timely data
       availability.
       Developed Spark applications focusing on data extraction, transformation, and aggregation from
       diverse file types for deep analysis into customer behavior.
       Utilized Docker to containerize and manage data-intensive applications, ensuring scalable and consistent
       performance across different environments.
       Developed Kafka producers and consumers, HBase clients, and Spark jobs using Python, integrating
       components on Azure Blob Storage and Azure Data Lake Storage.
       Optimized query performance for complex SQL scripts involving statistical aggregates, ensuring fast
       and efficient data retrieval.
       Developed a Kafka producer API for streaming data into Kafka topics and Spark-Streaming
       applications to process and store data from these topics into HBase.
       Engaged in a Production Environment, building CI/CD pipelines with Jenkins, covering all stages from
       GitHub code checkout to deployment in Azure environments.
       Managed resilient and scalable data storage solutions using Azure Blob Storage and Azure Cosmos DB,
       securing data and enabling efficient query performance.
       Enhanced data analysis and reporting capabilities by utilizing Toad s Data Point tool for data
       preparation, analysis, and visualization, supporting data-driven decision-making with timely and
       accurate data insights.
       Collaborated with cross-functional teams to align data engineering efforts with strategic goals,
       delivering impactful data-driven solutions.

TECH STACK: Apache Spark, Hive, HBase, Confluent Kafka, Databrick, Azure HDInsight, LookML, Blob Storage,
Azure Data Factory, Cosmos DB, GraphQL, Airflow, Spark Streaming, IBM DataStage, Toad, DBT, Docker, PySpark,
Python, SQL, Scala, Big Data, Jenkins, Terraform, GitHub, Airflow


Avon Technologies Pvt Ltd Hyd India                                                    Sep 2015 to Jan 2017
Role: Sr. DATA ENGINEER
RESPONSIBILITIES:
     Designed data integration solutions with Informatica PowerCenter, facilitating streamlined ETL
     processes into a centralized data warehouse.
     Managed Azure SQL Database clusters, applying my SQL expertise to undertake data warehousing
     tasks, thereby optimizing storage and enhancing query performance for significant data volumes.
     Deployed real-time data processing pipelines employing Apache Kafka and Java, ensuring efficient data
     ingestion and distribution suitable for analytics.
     Developed scalable big data applications with Hadoop and Apache Spark, improving data processing
     capabilities to support sophisticated analytical demands.
     Automated system tasks in Unix, boosting efficiencies and reducing manual interventions within data
     operations.
     Implemented Airflow sensors to monitor the completion of upstream tasks, ensuring seamless
     workflow execution and minimizing delays in data processing pipelines.
       Oversaw Hadoop clusters managed by Cloudera, maintaining high availability and scalability for
       essential data processes.
       Engineered data transformations and routines in Python and Java, augmenting data utility for business
       intelligence and machine learning applications.
       Utilized Azure Functions and Azure Virtual Machines for deploying and managing applications,
       capitalizing on cloud scalability to meet fluctuating data needs.
       Enhanced data integration capabilities with Talend, complementing Informatica PowerCenter's
       functionality and improving data integrity.
       Orchestrated efficient data workflows with Sqoop, facilitating the transfer of bulk data between Hadoop
       and relational databases like Oracle and MySQL.
       Generated accessible data insights for stakeholders through intuitive visualizations and dashboards
       created with Tableau, aiding in informed decision-making.
       Executed thorough data audits and upheld governance standards to ensure data accuracy and
       regulatory adherence.
       Fostered continuous improvement in data practices by aligning data engineering projects with
       organizational objectives, in collaboration with cross-functional teams.
       Developed comprehensive documentation for data models, ETL workflows, and data management
       protocols to ensure transparency and facilitate knowledge sharing.
       Educated junior engineers and team members on data management best practices, enhancing team
       proficiency in using Informatica PowerCenter and Azure SQL Database.

TECH STACK: Hadoop, Apache Spark, Kafka, Informatica PowerCenter, Oracle, MySQL, Python, Talend, Airflow,
Tableau, Azure Virtual Machines, Azure Functions, Azure SQL Database, Data Warehousing, Cloudera, Unix
Scripting, Hive, Java, Sqoop.


Hudda InfoTech Private Limited Hyderabad, India                                       Jan 2014 to Aug 2015
Role: DATA ENGINEER
RESPONSIBILITIES:
     Led real-time data processing and intraday analysis using Apache Spark, harnessing its in-memory
     processing to enhance performance.
     Developed and managed robust ETL processes with Informatica, ensuring efficient data integration and
     transformation.
     Mastered ETL processes using Informatica PowerCenter, automating data integration tasks that
     improved data quality and workflow efficiency.
     Orchestrated complex data flow pipelines, leveraging the power of traditional relational database
     management systems like SQL Server and Oracle.
     Implemented CI/CD pipelines using Jenkins and managed Hadoop workflows with Apache Oozie,
     automating deployment processes and batch jobs to enhance productivity.
     Created complex transformations and data flows using Informatica PowerCenter, aligning with
     business intelligence requirements.
     Conducted ETL operations from diverse source systems to centralized relational databases using T-SQL
     and PL/SQL for analytical processing.
     Optimized SQL queries and database schemas in SQL Server and Oracle, enhancing performance
     and reliability for critical financial transaction systems.
     Managed a large Hadoop cluster with HDFS, optimizing storage and computing resources to handle
     petabyte-scale data efficiently, reducing operational costs by 20%.
     Authored Python and Bash scripts to automate data transformation and loading processes, focusing on
     on-premises data platforms.
     Utilized the Hadoop ecosystem, including Hive and HDFS, to manage large-scale data storage and table
     creation with the HiveQL and Scala API.
     Configured and managed workflow orchestration with Apache Oozie, developing complex workflows in
     Python.
       Deployed big data applications using Talend Open Studio, focusing on data provenance and lineage
       through systems like Teradata and IBM DB2.
       Crafted interactive dashboards in Tableau Desktop, facilitating real-time data analysis and decision-
       making for end users.
       Executed data mapping and transformation processes for comprehensive data extraction and loading.

TECH STACK: Apache Spark, Hadoop, PySpark, HDFS, Cloudera, Informatica PowerCenter, SQL Server, Oracle,
Hive, Jenkins, Teradata, IBM DB2, Shell Scripting, Tableau Desktop, Apache Oozie, Talend Open Studio.
Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: