| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name | DATA ENGINEER
LinkedIn | EMAIL AVAILABLE | PHONE NUMBER AVAILABLE
PROFESSIONAL SUMMARY
Proficient in analyzing requirements, designing, developing, testing, and managing complete data
management and processing systems, including process documentation and ETL technical and design
documents.
Orchestrated the development and execution of ETL pipelines using Databricks for efficient data
ingestion, transformation, and loading, resulting in a scalable and maintainable data ecosystem.
Experienced in Aggregating data through Kafka, HDFS, Hive, Scala, and Spark Streams in Azure.
Well versed with Big Data on Azure cloud services (Virtual Machines, HDInsight, Blob Storage, Data
Factory, Cosmos DB, and Synapse Analytics).
Engineered a dynamic GraphQL schema generation process that adapted to changes in the underlying
graph ontology model, ensuring agility in development and deployment cycles.
Developed, deployed, and managed Azure Functions for event-driven and scheduled tasks, responding
to various Azure events including logging, monitoring, and security, and for scheduled backups.
Utilized advanced features of GraphQL subscriptions to implement real-time data updates across web
and mobile applications, enhancing user engagement and application value.
Familiar with integrating Spark Streaming with Kafka for real-time data processing and practical
knowledge of DBT on Azure cloud.
Leveraged Databricks Delta Lake's CDC (Change Data Capture) capabilities to capture and process
incremental data changes from streaming and transactional sources, ensuring up-to-date analytics
insights for real-time decision-making.
Experienced in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance,
Metadata Management, Master Data Management (MDM), and Configuration Management.
Utilized Spark Streaming APIs for on-the-fly transformations to create a common data model, receiving
real-time data from confluent Kafka and storing it in Snowflake.
Designed effective data schemas and storage models in Cassandra to optimize data access patterns
and support complex queries, significantly improving application responsiveness and user satisfaction.
Refactored and updated Python codebases to be compatible with the latest Python versions & cloud
environments, incorporating modern libraries and frameworks to enhance functionality & security.
Proficient in SQL across several dialects (MySQL, PostgreSQL, SQL Server, and Oracle) and experienced
in working with NoSQL databases like HBase, Cosmos DB.
Orchestrated end-to-end ETL workflows using Databricks on Azure, leveraging Spark to efficiently
process large datasets, enabling scalable data preparation for analytics and business intelligence
applications.
Hands-on experience with Spark and Scala APIs to compare Spark's performance with Hive and SQL
and used Spark SQL for manipulating Data Frames in Scala.
Innovated in the realm of dimensional data modeling, designing star and snowflake schemas that
facilitated faster data retrieval and more intuitive data analysis, leading to improved business insights.
Skilled in implementing and orchestrating data pipelines using Oozie and Airflow.
Developed custom Looker extensions and applications using Looker's extension framework,
providing tailored analytics solutions that enhanced operational efficiency by 25%.
Extensively worked with Teradata utilities like Fast export, and Multi Load for data export and load
to/from various sources including flat files. Developed MapReduce Programs using Apache Hadoop for
big data analysis.
Developed ETL pipelines in and out of data warehouses using a combination of Python and Snowsql.
EDUCATION
B.Tech(ECE) - 2014 at JNTUA with an 80% GPA
TECHNICAL SKILLS
Category Skills
HDFS, GPFS, HIVE, SQOOP, SNOWFLAKE, SPARK, YARN, PIG, Kafka, DB2,
Big Data Confluent Kafka, Databricks, SCD(Type 1, Type 2, Type 3), GraphQL
Hadoop Distributions Hortonworks, Cloudera, IBM Big Insights, AWS EMR
Operating Systems Windows, Linux (Centos, Ubuntu)
Programming
Languages PYTHON, Spark, Scala, SHELL SCRIPTING, SQL, PySpark
Databases Hive, MYSQL, NETEZZA, SQL Server, Redshift, Snowflake, DynamoDB
IntelliJ IDEA, ECLIPSE, PYCHARM, GIT, Jenkins, Docker, GitHub, Terraform,
IDE Tools & Utilities Airflow
Apache Kafka, Confluent Kafka, Spark Structured Streaming, Azure Stream
Streaming Platforms Analytics, Apache Flink, Kafka Streams, DBT on Azure
ETL Data stage 9.1/11.5 (Designer/Monitor/Director), AWS Glue, Snowpipe
Job Scheduler Control-M, AMBARI, Apache Airflow, AWS Data Pipeline
Reporting Tools Tableau, Power BI, Tableau Desktop
Azure(Virtual Machines, Blob Storage, HDInsight, Cosmos DB, Data Factory,
Functions, Synapse Analytics, Data Factory / Data Catalog, Synapse Analytics),
Cloud Computing Tools GCP, Snowflake
File Types Json, XML, Parquet, text, CSV
Scrum Methodologies Agile, Jira
MS Office, Service Now, OPTIM, WinSCP, MS Visio, Cloudera Manager, AWS CLI,
Others Databricks Secret Manager, Apache Kafka, Teradata
PROFESSIONAL EXPERIENCE
Maximus, Denver, CO Remote Oct 2021 to Present
Role: Azure DATA ENGINEER
RESPONSIBILITIES:
Developed and optimized ETL workflows using Azure Data Factory, automating data integration and
transformation processes to support large-scale analytics.
Orchestrated end-to-end ETL workflows using Databricks on Azure, leveraging Spark to efficiently
process large datasets, enabling scalable data preparation for analytics and business intelligence
applications.
Utilized Databricks Delta Lake to merge, update, and delete operations within data lakes, enabling
dynamic and efficient data management practices that support evolving data schemas and real-time
analytics needs.
Designed and implemented complex data processing pipelines with Apache Airflow, automating ETL
workflows and improving data reliability and processing time by 30%.
Configured and maintained scalable and secure data warehouses with Azure Synapse Analytics,
enhancing data aggregation and query performance.
Scripted in Python and used Snowflake s SnowSQL for data cleansing and preparation, ensuring high-
quality data for business intelligence applications.
Implemented transformations within Databricks notebooks using PySpark and SparkSQL,
optimizing data structures for analytics and reporting, significantly improving query performance and
insights accuracy.
Designed and deployed robust data pipelines with Azure Data Factory and orchestrated complex data
flows, ensuring timely and accurate data availability.
Architected a cloud-based data warehousing solution in Snowflake, optimizing data storage and
computation to handle over 10TB of data with dynamic scaling, significantly improving query
performance and user access.
Employed Apache Spark and Databricks for complex data processing and analytics, leveraging Spark
SQL to handle massive datasets efficiently.
Optimized Spark configurations within Databricks clusters to tailor resource allocation (executor
memory, cores) for specific workloads, achieving significant improvements in job performance and cost
efficiency.
Integrated Azure Data Factory & Databricks for automated data pipeline orchestration, facilitating
seamless data ingestion from various sources into Delta Lake, supporting real-time & batch processing
workflows.
Developed and maintained semantic data models to support knowledge representation and
reasoning, enabling advanced analytics and machine learning applications, which enhanced decision-
making processes.
Configured Databricks clusters to auto-scale based on workload demands, ensuring cost-effective use
of Azure resources while maintaining performance for ETL and data transformation tasks.
Implemented GraphQL APIs for seamless access to graph-based data stores, optimizing query
performance and enabling flexible, client-specific data fetching that improved application
responsiveness and user satisfaction.
Designed and implemented dimensional data models, facilitating efficient OLAP queries and
enhancing data analysis capabilities, resulting in a 20% increase in query performance.
Participated in DevOps activities, facilitating continuous integration and delivery (CI/CD) for data
applications and infrastructure as code (IaC) using Terraform.
Automated the scaling of Databricks clusters based on workload patterns using Azure functions,
ensuring that computational resources are dynamically adjusted to workload demands, optimizing cost
and performance.
TECH STACK: Azure (Data Factory, Synapse Analytics, Functions, Data Lake Store, LookML, Data Catalog),
Databricks, Apache Spark, Python, GraphQL, Snowflake, SQL, SCD, Snowflake, Hadoop, Git, Jira, Shell Scripting,
Spark SQL, Parquet files, Data Cleansing, Data Warehousing, DevOps
Cbre, Dallas, TX Remote Apr 2019 - Sep 2021
Role: Sr. BIG DATA ENGINEER
RESPONSIBILITIES:
Masterminded and optimized ETL jobs using Azure Data Factory, streamlining data integration and
transformation, which significantly improved data pipeline performance.
Orchestrated job scheduling and automation of data processes using Azure Logic Apps and Azure
Functions, enhancing operational efficiency.
Leveraged Azure Functions for serverless data operations, facilitating efficient data transformation
tasks.
Leveraged Databricks Autoloader for efficient and incremental data ingestion from cloud storage
services, minimizing latency in data availability and ensuring datasets are up-to-date for real-time
analytics applications.
Spearheaded the migration of legacy ETL processes to Databricks PySpark, optimizing data
transformation and loading speeds by 30% and enabling real-time data analytics.
Directed the use of Apache Airflow (hosted on Azure VMs and AKS for orchestration) for orchestrating
complex workflows, managing dependencies, and scheduling with precision.
Led the migration of traditional on-premises data warehouses to Snowflake, significantly enhancing
data processing speed by leveraging Snowflake s unique multi-cluster, shared data architecture, and
reducing infrastructure costs by 30%.
Utilized Databricks SQL Analytics to execute high-performance SQL queries on Delta Lake, providing
analysts and business users with a familiar SQL interface to explore and visualize big data with minimal
learning curve.
Utilized GraphQL schemas to define and enforce data structures and relationships, ensuring data
integrity and consistency across microservices architectures, and reducing data redundancy.
Leveraged the Looker IDE to efficiently develop and debug LookML code, incorporating version control
practices to manage changes and collaboration across the data team.
Enhanced data security by implementing field-level encryption within Delta Lake using Databricks,
securing sensitive information while allowing for analytics on encrypted data without compromising
privacy.
Constructed interactive visualizations and dashboards with Tableau, turning complex data into
actionable insights for business stakeholders.
Working knowledge of Python libraries such as NumPy, SciPy, matplotlib, urllib2, DataFrame,
Pandas, and Pytables, enhancing data analysis and processing capabilities.
Expert in using statistical aggregate functions (such as COUNT, SUM, AVG, MIN, MAX, VAR, and
STDDEV) to derive meaningful insights from data.
Employed the LookML Validator to catch syntax errors and optimize code efficiency before
deployment, ensuring high-quality models and a seamless user experience.
Developed and managed data transformation and migration processes using Azure Data Factory for
optimized data flow between various sources.
Utilized SQL Runner for advanced data exploration and query performance testing, identifying
optimization opportunities that led to a 30% reduction in load times for critical dashboards.
Utilized Databricks REST API to manage clusters, jobs, and notebooks, enabling automation of data
workflows and integration with external applications and services for comprehensive data ecosystem
management.
TECH STACK: Azure (VMs, Blob Storage, Data Factory, Cosmos DB, LookML, SQL Runner, Airflow, SQL Database,
Synapse Analytics, Functions, Stream Analytics, Logic Apps), Snowflake, SnowSQL, SnowPipe, GraphQL, Python,
SQL, Airflow, Big Data, Data Modeling, Toad, Tableau, MongoDB, UNIX & LINUX, Shell Scripting
Global Atlantic financial group, Indianapolis, IN Feb 2017 to Mar 2019
Role: BIG DATA ENGINEER
RESPONSIBILITIES:
Crafted real-time data processing systems using Apache Spark Structured Streaming and Kafka,
integrated within the Azure ecosystem for improved data flow.
Utilized Azure HDInsight for managing big data processing clusters, employing Scala and PySpark for
executing complex transformations and aggregating financial data efficiently.
Developed and deployed robust ETL pipelines using Databricks PySpark, integrating data from
multiple sources into a cohesive data warehousing solution, enhancing data quality and availability.
Deployed Spark Streaming APIs to perform real-time transformations, building a common data model
to persist data from Confluent Kafka to Snowflake.
Engineered streaming applications with Python and PySpark to transfer data from Kafka topics directly
into Snowflake, operated within the Azure cloud environment.
Demonstrated expertise in Data warehousing concepts, effectively managing OLTP and OLAP systems,
and implementing dimensional models including Facts and Dimensions to support complex business
intelligence requirements.
Utilized Airflow's dynamic DAG capabilities to programmatically generate and manage workflows,
enabling scalable and flexible data processing solutions tailored to business needs.
Optimized DataStage jobs for performance, focusing on parallel processing, effective memory
management, and ETL process tuning for large data volumes.
Designed and implemented data models and transformations using DBT, streamlining data workflows
and accelerating the delivery of data insights.
Utilized Azure Data Factory for ETL operations, automating data extraction, transformation, and loading
processes to enhance data quality and accessibility.
Proficient in setting up job sequences and schedules using DataStage Scheduler, for timely data
availability.
Developed Spark applications focusing on data extraction, transformation, and aggregation from
diverse file types for deep analysis into customer behavior.
Utilized Docker to containerize and manage data-intensive applications, ensuring scalable and consistent
performance across different environments.
Developed Kafka producers and consumers, HBase clients, and Spark jobs using Python, integrating
components on Azure Blob Storage and Azure Data Lake Storage.
Optimized query performance for complex SQL scripts involving statistical aggregates, ensuring fast
and efficient data retrieval.
Developed a Kafka producer API for streaming data into Kafka topics and Spark-Streaming
applications to process and store data from these topics into HBase.
Engaged in a Production Environment, building CI/CD pipelines with Jenkins, covering all stages from
GitHub code checkout to deployment in Azure environments.
Managed resilient and scalable data storage solutions using Azure Blob Storage and Azure Cosmos DB,
securing data and enabling efficient query performance.
Enhanced data analysis and reporting capabilities by utilizing Toad s Data Point tool for data
preparation, analysis, and visualization, supporting data-driven decision-making with timely and
accurate data insights.
Collaborated with cross-functional teams to align data engineering efforts with strategic goals,
delivering impactful data-driven solutions.
TECH STACK: Apache Spark, Hive, HBase, Confluent Kafka, Databrick, Azure HDInsight, LookML, Blob Storage,
Azure Data Factory, Cosmos DB, GraphQL, Airflow, Spark Streaming, IBM DataStage, Toad, DBT, Docker, PySpark,
Python, SQL, Scala, Big Data, Jenkins, Terraform, GitHub, Airflow
Avon Technologies Pvt Ltd Hyd India Sep 2015 to Jan 2017
Role: Sr. DATA ENGINEER
RESPONSIBILITIES:
Designed data integration solutions with Informatica PowerCenter, facilitating streamlined ETL
processes into a centralized data warehouse.
Managed Azure SQL Database clusters, applying my SQL expertise to undertake data warehousing
tasks, thereby optimizing storage and enhancing query performance for significant data volumes.
Deployed real-time data processing pipelines employing Apache Kafka and Java, ensuring efficient data
ingestion and distribution suitable for analytics.
Developed scalable big data applications with Hadoop and Apache Spark, improving data processing
capabilities to support sophisticated analytical demands.
Automated system tasks in Unix, boosting efficiencies and reducing manual interventions within data
operations.
Implemented Airflow sensors to monitor the completion of upstream tasks, ensuring seamless
workflow execution and minimizing delays in data processing pipelines.
Oversaw Hadoop clusters managed by Cloudera, maintaining high availability and scalability for
essential data processes.
Engineered data transformations and routines in Python and Java, augmenting data utility for business
intelligence and machine learning applications.
Utilized Azure Functions and Azure Virtual Machines for deploying and managing applications,
capitalizing on cloud scalability to meet fluctuating data needs.
Enhanced data integration capabilities with Talend, complementing Informatica PowerCenter's
functionality and improving data integrity.
Orchestrated efficient data workflows with Sqoop, facilitating the transfer of bulk data between Hadoop
and relational databases like Oracle and MySQL.
Generated accessible data insights for stakeholders through intuitive visualizations and dashboards
created with Tableau, aiding in informed decision-making.
Executed thorough data audits and upheld governance standards to ensure data accuracy and
regulatory adherence.
Fostered continuous improvement in data practices by aligning data engineering projects with
organizational objectives, in collaboration with cross-functional teams.
Developed comprehensive documentation for data models, ETL workflows, and data management
protocols to ensure transparency and facilitate knowledge sharing.
Educated junior engineers and team members on data management best practices, enhancing team
proficiency in using Informatica PowerCenter and Azure SQL Database.
TECH STACK: Hadoop, Apache Spark, Kafka, Informatica PowerCenter, Oracle, MySQL, Python, Talend, Airflow,
Tableau, Azure Virtual Machines, Azure Functions, Azure SQL Database, Data Warehousing, Cloudera, Unix
Scripting, Hive, Java, Sqoop.
Hudda InfoTech Private Limited Hyderabad, India Jan 2014 to Aug 2015
Role: DATA ENGINEER
RESPONSIBILITIES:
Led real-time data processing and intraday analysis using Apache Spark, harnessing its in-memory
processing to enhance performance.
Developed and managed robust ETL processes with Informatica, ensuring efficient data integration and
transformation.
Mastered ETL processes using Informatica PowerCenter, automating data integration tasks that
improved data quality and workflow efficiency.
Orchestrated complex data flow pipelines, leveraging the power of traditional relational database
management systems like SQL Server and Oracle.
Implemented CI/CD pipelines using Jenkins and managed Hadoop workflows with Apache Oozie,
automating deployment processes and batch jobs to enhance productivity.
Created complex transformations and data flows using Informatica PowerCenter, aligning with
business intelligence requirements.
Conducted ETL operations from diverse source systems to centralized relational databases using T-SQL
and PL/SQL for analytical processing.
Optimized SQL queries and database schemas in SQL Server and Oracle, enhancing performance
and reliability for critical financial transaction systems.
Managed a large Hadoop cluster with HDFS, optimizing storage and computing resources to handle
petabyte-scale data efficiently, reducing operational costs by 20%.
Authored Python and Bash scripts to automate data transformation and loading processes, focusing on
on-premises data platforms.
Utilized the Hadoop ecosystem, including Hive and HDFS, to manage large-scale data storage and table
creation with the HiveQL and Scala API.
Configured and managed workflow orchestration with Apache Oozie, developing complex workflows in
Python.
Deployed big data applications using Talend Open Studio, focusing on data provenance and lineage
through systems like Teradata and IBM DB2.
Crafted interactive dashboards in Tableau Desktop, facilitating real-time data analysis and decision-
making for end users.
Executed data mapping and transformation processes for comprehensive data extraction and loading.
TECH STACK: Apache Spark, Hadoop, PySpark, HDFS, Cloudera, Informatica PowerCenter, SQL Server, Oracle,
Hive, Jenkins, Teradata, IBM DB2, Shell Scripting, Tableau Desktop, Apache Oozie, Talend Open Studio.
|