| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
PHONE NUMBER AVAILABLE
EMAIL AVAILABLE
https://LINKEDIN LINK AVAILABLE
Senior Data Engineer
PROFESSIONAL SUMMARY:
Experienced Senior Data Engineer with over 9 years in the field, specializing in data engineering and analytics.
Proven proficiency in designing and implementing scalable data solutions across diverse domains, utilizing Azure
services such as Azure Data Factory, Azure SQL Database, Azure Databricks, and Snowflake, along with a strong
background in AWS.
Expertise encompasses data warehousing, ETL processes, and data pipeline development, utilizing tools like Azure
Data Factory, AWS Glue, Apache Airflow, and Talend.
Well-versed in multiple database technologies, including Azure SQL Database, PostgreSQL, Oracle Database, AWS
Redshift, and Snowflake.
Deep understanding of big data technologies, including Azure Databricks, Apache Hadoop, and Apache Kafka.
Proficient in data streaming and processing using Azure Stream Analytics, Apache Flink, Apache NiFi, and Snowflake.
Experienced in cloud technologies and services such as Azure, AWS.
Skilled in developing and deploying data solutions using Python, SQL, R, and other programming languages, with a
primary focus on Azure ML tools.
Expert in data visualization and reporting tools such as Power BI, Tableau, and QlikView, specializing in Azure Power
BI.
Demonstrated expertise in implementing data security and compliance measures, particularly in Azure
environments.
Thorough understanding of agile methodologies and DevOps practices, including continuous integration and
deployment using Azure DevOps.
Capable leader and mentor in high-paced environments, ensuring the timely delivery of high-quality data solutions.
Exceptional problem-solving skills and meticulous attention to detail.
Proven track record of delivering high-quality data solutions on time and within budget.
Strong communication and collaboration skills, with the ability to work effectively with cross-functional teams in both
Azure and AWS environments.
Continuous learner with up-to-date knowledge of the latest trends and technologies in data engineering.
Experience in the financial services and healthcare domains, providing domain-specific data solutions.
Demonstrated ability to manage large datasets and handle complex data integration tasks.
Proficient in automated data pipeline development using tools like Azure Data Factory, Apache Airflow, Talend, and
Snowflake.
Adept in machine learning operations and analytics using MLflow and other ML tools, with a focus on Azure Machine
Learning.
TECHNICAL SKILLS:
Programming Languages: Python, SQL, R, PL/SQL, Scala,
Big Data Technologies: Apache Hadoop, Apache Kafka, Apache Flink, Apache NiFi, Databricks
Database Technologies: PostgreSQL, Oracle Database, AWS Redshift, DynamoDB, Azure Cosmos DB,
Cassandra
Data Warehousing: AWS Redshift, Azure Synapse Analytics, Apache Hive
ETL Tools: Apache Airflow, Data Pipeline, Informatica PowerCenter, AWS Glue, Talend, Azure Data Factory,
Salesforce, Snowflake, Spark
Cloud Platforms: AWS, Azure, Docker, Kubernetes
Data Lake and Storage: Azure Data Lake Store Gen2, Amazon S3
Data Visualization: Tableau, QlikView, Power BI, Platfora
DevOps Tools: GitLab, Jenkins, Terraform, Azure DevOps, Ansible
Machine Learning Operations: MLflow, Azure Machine Learning
Other technologies: Apache Drill, Apache Sentry, Sqoop, AWS Lambda, AWS X-Ray, AWS Glue, Amazon EMR,
Amazon SNS, SQS, Elasticsearch, EC2, IAM, RDS, CloudWatch, SAM
Agile Methodologies and Project Management
PROFESSIONAL EXPERIENCE:
Client: Computomic, New Jersey, NJ. Apr 2022 to till date
Role: Senior Data Engineer
Project Description: Led a proficient data engineering team in the conception and execution of scalable solutions,
harnessing the capabilities of Azure, Snowflake, Salesforce, Talend, and Power BI. Demonstrated expertise in fine-
tuning ETL processes and refining data pipelines within the Azure ecosystem, with a specialized emphasis on
enhancing data warehousing functionalities.
Roles & Responsibilities:
Developed and maintained an efficient data pipeline architecture within Microsoft Azure, employing tools such
as Data Factory and Azure Databricks.
Developed architectural solutions that incorporated Talend for robust ETL processes and Power BI for advanced
reporting, tailored to meet specific requirements in Chevron's use case.
Crafted user-friendly technical solutions, ensuring clarity and acceptance among stakeholders.
Conducted client education sessions on the advantages and drawbacks of various Azure PaaS and SaaS solutions,
with a focus on prioritizing cost-effective approaches.
Implemented self-service reporting in Azure Data Lake Store Gen2 through an ELT approach, optimizing data
processing efficiency.
Applied Spark Vectorized pandas user-defined functions via Talend for intricate data manipulation and
wrangling.
Executed a staged data transfer approach, systematically moving data from System of Records to raw, refined,
and produced zones to facilitate efficient translation and denormalization.
Established Azure infrastructure components, including storage accounts, integration runtimes, service principal
IDs, and app registrations, to ensure scalable and optimized utilization of analytical requirements.
Wrote PySpark and Spark SQL transformations in Azure Databricks for intricate business rule implementations,
seamlessly integrating Talend for enhanced capabilities.
Developed Data Factory pipelines proficiently for bulk copying multiple tables from relational databases to
Azure Data Lake Gen2.
Engineered a custom logging framework for ELT pipeline logging in Data Factory using Append variables.
Enabled monitoring and employed Azure Log Analytics to proactively alert support teams about the usage and
statistics of daily runs.
Spearheaded proof of concept projects from ideation to production pipelines, delivering tangible business value
leveraging Azure Data Factory, Talend, and Power BI.
Ensured secure data separation across national boundaries through multiple data centers and regions.
Applied continuous integration/continuous development best practices using Azure DevOps, incorporating code
versioning, and deploying with Ansible playbook.
Delivered denormalized data for Power BI consumers from the produced layer in Data Lake, enriching modelling,
and visualization experiences.
Collaborated seamlessly in a SAFE (Scaled Agile Framework) team, actively participating in daily stand-ups, sprint
planning, and quarterly planning sessions.
Environment: Azure Data Factory, Azure Databricks, Azure Data Lake Store Gen2, Azure Log Analytics, Azure DevOps,
Talend, Power BI, PySpark and Spark SQL, Spark Vectorised pandas user-defined functions, Data Factory pipelines,
Azure DevOps, Ansible playbook, Scaled Agile Framework (SAFE).
Client: Medtronic, Minneapolis, MA Sep 2020 to Apr 2022
Role: Big Data Engineer
Project Description: Led a pioneering healthcare data engineering initiative at Medtronic, leveraging Azure cloud
services to orchestrate efficient pipelines with a focus on Hadoop, Databricks, and Apache Flink. Managed Azure
Databricks clusters to ensure robust ETL processes, maintained data integrity in PostgreSQL, and automated
workflows with Apache Airflow. Azure was integral in the incorporation of Snowflake for advanced data warehousing
and analytics capabilities.
Roles & Responsibilities:
Developed and deployed a versatile ETL framework for efficient data extraction from diverse sources using
Spark, with a specific focus on Azure Databricks clusters.
Utilized Platfora for data visualization on Hadoop, creating Lens and Viz boards for real-time insights, while
leveraging Azure services for seamless integration.
Executed data queries and analyses in Cassandra, applying various Data Modeling techniques tailored for
Cassandra databases, with considerations for Azure compatibility.
Used Spark and Scala for joining multiple tables in Cassandra, enabling seamless analytics on consolidated
datasets within an Azure environment.
Engaged in enterprise-wide upgrades, troubleshooting, and performance tuning for Hadoop clusters, including
those hosted on Azure.
Configured Apache Drill on Hadoop for seamless data integration across SQL and NoSQL databases, taking
advantage of Azure capabilities for enhanced connectivity.
Orchestrated data ingestion into Hadoop and Cassandra using Kafka from diverse sources, ensuring
compatibility with Azure data storage solutions.
Utilized Tidal enterprise scheduler and Oozie Operational Services for effective cluster coordination and
workflow scheduling, with considerations for Azure cloud infrastructure.
Implemented Spark streaming for real-time data transformation, considering Azure services for optimal
scalability and performance.
Designed and created Tableau dashboards to address diverse business needs, incorporating Azure connectors
for seamless data access.
Installed and configured Hive, wrote Hive UDFs, and utilized Piggybank repository for Pig Latin, ensuring
compatibility with Azure-based data ecosystems.
Implemented Partitioning, Dynamic Partitions, and Buckets in HIVE to enhance data access efficiency,
considering Azure storage optimization.
Employed Sqoop to export analyzed data to relational databases for BI team visualization in Tableau, considering
Azure database services for integration.
Implemented a Composite server for data virtualization needs, creating multiple views with restricted data
access through a REST API, considering Azure API services.
Led the conception and implementation of the next-generation architecture, optimizing data ingestion and
processing efficiency, with a focus on Azure cloud services.
Developed and implemented various shell scripts for job automation, considering Azure automation tools and
scripts.
Implemented Apache Sentry to restrict Hive table access on a group level, ensuring compatibility with Azure
security protocols.
Utilized AVRO format for comprehensive data ingestion, enhancing operational speed and minimizing space
utilization, with considerations for Azure storage efficiency.
Proficiently managed and reviewed Hadoop log files, incorporating Azure monitoring and logging solutions.
Operated in an Agile environment, utilizing the Rally tool for maintaining user stories and tasks, with integration
capabilities for Azure DevOps.
Collaborated with Enterprise data support teams for Hadoop updates, patches, and version upgrades, ensuring
seamless integration with Azure services.
Implemented test scripts for test-driven development and continuous integration, with Azure-compatible
testing frameworks.
Leveraged Spark for parallel data processing, achieving enhanced performance outcomes, with considerations
for Azure parallel processing capabilities.
Environment: SQL, NoSQL, PostgreSQL, Apache Spark, Azure Databricks, Platfora, Tableau, Apache Cassandra, Scala,
Hadoop clusters, Apache Drill, Azure Blob Storage, Azure DevOps, Apache Sentry, REST API, Azure API services, Sqoop,
Apache Kafka, Hadoop.
Client: Edward Jones, St. louis, MO Oct 2017 to Aug 2020
Role: AWS Data Engineer
Project Description: As an active AWS-Centric Data Engineer, I assumed a key role in leading a pioneering AWS
venture. I specialized in constructing and overseeing extensive pipelines in the AWS landscape, leveraging the
functionalities of Apache Airflow and Data Pipeline. My duties encompassed the coordination of ETL processes and
the optimization of data flows in both AWS Redshift and Apache Hadoop. Particularly noteworthy is my seamless
integration of Salesforce, adding value to a comprehensive and unified approach to data integration.
Roles & Responsibilities:
Implemented a 'serverless' architecture using AWS components, including API Gateway, Lambda, and
DynamoDB, facilitating seamless deployment of AWS Lambda code from Amazon S3 buckets.
Designed and configured Lambda functions to receive events from S3 buckets, while creating robust data
models for data-intensive AWS Lambda applications. These applications aimed at performing complex analyses
and generating analytical reports, ensuring end-to-end traceability, and defining Key Business elements from
Aurora.
Wrote optimized code to enhance the performance of AWS services, addressing the needs of application
teams. Ensured Code-level application security for clients by implementing IAM roles, credentials, and
encryption strategies.
Developed AWS Lambda functions using Python for efficient deployment management within the AWS
ecosystem. Designed and implemented public-facing websites on Amazon Web Services, seamlessly
integrating them with other applications' infrastructure.
Created diverse AWS Lambda functions and API Gateways, enabling data submission through API Gateway
accessible via Lambda functions.
Led the creation of Cloud Formation templates for various AWS services, including SNS, SQS, Elasticsearch,
DynamoDB, Lambda, EC2, VPC, RDS, S3, IAM, and CloudWatch. Ensured seamless integration with Service
Catalog.
Conducted regular monitoring activities on Unix/Linux servers, ensuring application availability and
performance. Monitored logs, CPU usage, memory, load, and disk space using cloud watch and AWS X-ray.
Implemented AWS X-Ray service within Confidential for visualizing node and edge latency distribution directly
from the service map.
Designed and developed ETL processes in AWS Glue, facilitating the migration of data from external sources
like S3, ORC/Parquet/Text Files into AWS Redshift.
Utilized Python libraries, including Boto3 and NumPy, for AWS operations. Employed Amazon EMR for
MapReduce jobs, testing locally using Jenkins.
Created external tables with partitions using Hive, AWS Athena, and Redshift. Developed PySpark code for
AWS Glue jobs and for EMR.
Demonstrated proficiency in other AWS services like S3, EC2 IAM, and RDS. Experienced in orchestration and
Data Pipeline using AWS Step Functions, Data Pipeline, and Glue.
Wrote SAM templates to deploy serverless applications on the AWS cloud.
Environment: API Gateway, Lambda, DynamoDB, Amazon S3, Aurora, AWS X-Ray, SNS, SQS, Elasticsearch,
CloudWatch, AWS Glue, AWS Redshift, Boto3, NumPy, Amazon EMR, Hive, AWS Athena, PySpark, AWS SAM
(Serverless Application Model)
Client: Synechron Technologies Pvt. Ltd, Hyderabad, India Jan 2016 to May 2017
Role: SQL Developer
Project Description: Led comprehensive SQL development initiatives, mastering Oracle Database and PL/SQL for
optimal database management and integrity. Developed sophisticated data integration solutions with Informatica
PowerCenter and facilitated real-time processing using Apache Kafka.
Roles & Responsibilities:
Mastered Oracle Database and PL/SQL for efficient management and optimization. Implemented performance
tuning strategies, improving query response times.
Developed complex solutions using Informatica PowerCenter for seamless data flow. Ensured scalability and
maintainability of data integration workflows.
Implemented Snowflake data warehousing solutions, optimizing storage and retrieval of extensive datasets.
Led real-time processing initiatives using Apache Kafka for dynamic data streams. Integrated Kafka into existing
architectures, enabling real-time analytics.
Utilized Jenkins for CI/CD, establishing automated workflows for faster delivery. Implemented version control
strategies for a reliable development process.
Created advanced dashboards in Tableau for interactive data representations. Leveraged Tableau features for
complex data analysis and trend identification.
Managed Docker containers for efficient deployment across environments. Orchestrated containerized
solutions, reducing setup and configuration times.
Implemented agile practices, fostering adaptive and collaborative workflows. Emphasized regular sprints,
feedback loops, and continuous improvement.
Maintained comprehensive documentation for clarity and knowledge transfer. Advocated for data security and
privacy best practices.
Contributed to strategic planning for future data initiatives. Focused on fostering a culture of innovation and
continuous learning.
Environment: Oracle DB, PL/SQL, Informatica PowerCenter, Apache Kafka, Snowflake, Jenkins, Tableau, Docker, SQL.
Client: Eclerx Service LTD Hyderabad, India Jun 2014 to Dec 2015
Role: Data Analyst
Project Description: Led data analysis and reporting initiatives, leveraging Microsoft Excel for advanced analysis
and PostgreSQL databases for efficient storage. Developed Python scripts for complex processing, implemented data
warehousing with Apache Hive, and created interactive Tableau dashboards for insightful analytics.
Roles & Responsibilities:
Conducted advanced data analysis and reporting using Microsoft Excel, optimizing insights for decision-making.
Managed and optimized PostgreSQL databases, ensuring efficient data storage for large datasets.
Developed Python scripts (pandas, NumPy) for complex data processing tasks, enhancing automation.
Implemented data warehousing solutions using Apache Hive, accommodating the storage needs of extensive
datasets.
Created interactive dashboards and reports in Tableau, providing insightful analytics for stakeholders.
Utilized GitLab for version control, ensuring effective tracking of changes in data projects.
Advocated for and ensured high data quality and accuracy in all analysis projects.
Collaborated with business teams to understand and meet data requirements, aligning solutions with
organizational goals.
Maintained comprehensive documentation for all data processes and systems, promoting knowledge sharing.
Advocated for data-driven decision-making within the organization, fostering a culture of data-driven insights.
Fostered a collaborative environment for data analysis and reporting, encouraging innovation and best
practices.
Environment: PostgreSQL, Excel, Python, Hive, Tableau, GitLab, QlikView.
Education:
Bachelor of Technology (B.Tech) in Information Technology from JNTUH University, Hyderabad, Telangana, India.
- 2014
|