| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateSENIOR DATA ENGINEERName: VenkataPhone: PHONE NUMBER AVAILABLEEmail: EMAIL AVAILABLELINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY:10 years of experience in data engineering, specializing in building, optimizing, and maintaining robust data pipelines using Spark, Hadoop, and AWS, Azure, and GCP technologies.Proficient in managing big data technologies such as Hive, Pig, HBase, and Sqoop, with extensive experience in data ingestion, storage, and retrieval processes.Advanced skills in real-time data processing using Spark Streaming, Kafka, and Azure Event Hubs, enhancing data flow and streamlining analytics operations.Deep expertise in cloud-based data solutions, managing AWS services like EC2, S3, Glacier, Redshift, Data Pipelines, and Glue for scalable data storage and processing.Developed complex ETL processes and workflows using Informatica 6.1 and Data Flux, ensuring data quality and consistency across diverse sources.Experienced with SQL databases; proficient in SQL, T-SQL, MySQL, and Teradata, optimizing queries and managing database operations efficiently.Implemented big data solutions on multiple platforms, including Cloudera, Azure Synapse, Azure Data Lake, and GCP Big Query, adapting to various enterprise needs.Designed and executed data migration strategies using Azure Data Factory and AWS Data Pipelines, ensuring seamless data transfers with minimal downtime.Utilized Scala IDE (Eclipse) and Python for script writing and data manipulation tasks, boosting automation and reducing manual data handling.Expertise in data visualization and business intelligence tools like Power BI, Tableau, QlikView, and Data Studio, delivering insightful dashboards and reports.Managed data security and compliance using IAM Security, Service Data Transfer, and VPN configurations, safeguarding sensitive data and adhering to regulatory standards.Experience with DevOps tools such as GIT Hub, Jira, and SharePoint for project management and version control, enhancing team collaboration and project tracking.Skilled in Linux Shell Scripting and automation, streamlining operations and enhancing system performance in Linux-based environments.Conducted performance tuning and database optimization using SQL Profiler and Database Engine Tuning Advisor, ensuring optimal data processing speeds and system efficiency.Integrated federated queries and managed multiple databases with technologies like Cloud SQL, Federated Queries, and VPC Configuration, expanding data accessibility and integration.Deployed and managed data analytics solutions on Google Cloud Platform, utilizing GCP Databricks, Pub Sub, and Gcs Bucket for enhanced data handling and analysis.Played a key role in driving successful data initiatives through collaborative problem-solving, communication, and knowledge sharing across team.Applied clustering techniques such as K-means and hierarchical clustering to segment and analyze large datasets, uncovering meaningful patterns and insights for decision support.Implemented CI/CD best practices using tools like Jenkins, GitLab CI/CD, or AWS Code Pipeline to automate code integration, testing, and deployment processes for data engineering projects.Developed ETL jobs using AWS Glue to extract, transform, and load data from various sources into AWS data lakes and data warehouses.Utilized PySpark to perform complex data processing tasks such as data cleansing, aggregation, and enrichment on distributed computing clusters.Writing Terraform configuration files (HCL) to define infrastructure resources on cloud platforms like AWS, Azure, or Google Cloud. Experience in monitoring, logging, and troubleshooting workflows in Airflow.Technical Skills:CategorySkillsBig Data TechnologiesSpark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, Hadoop, Map Reduce, Cloudera, HueCloud PlatformsAWS EC2, AWS S3, AWS Glacier, AWS Redshift, AWS Data Pipelines, AWS Glue, Azure Event Hubs, Azure Synapse, Azure Data Lake, Azure Data Factory, Azure Databricks, GCP, GCP Big Query, Cloud SQL, Cloud Storage, GCP Databricks, OpenShiftProgramming & ScriptingScala, Python, Linux Shell Scripting, Scala IDE (Eclipse), shell scriptsDatabase ManagementSnowflake, Teradata, SQL, T-SQL, MySQL, SQL Server 2017, Informatica 6.1, Data Flux, Oracle 9i, Quality Canter 8.2, TOAD, PL/SQL, Federated Queries, Data CatalogBI & Visualization ToolsQlikView, Power BI, Tableau, DataStudio, SFDCDevOps & CollaborationJira, GIT Hub, SharePoint, Windows 10, Tidal, VPN Google-Client, Pub Sub, Gcs BucketData Security & TransferIAM Security, Service Data Transfer, VPC ConfigurationPerformance & TuningSQL Profiler, Database Engine Tuning Advisor, Matillion, BQ-MLPROFESSIONAL EXPERIENCEAmway Corp ADA, MIAWS Data Engineer March 2022 TO PresentResponsibilities:Developed and optimized Big Data solutions using Spark, Hive, and Pig to handle large-scale data processing, improving data ingestion and transformation workflows.Implemented Spark SQL and Spark Streaming for real-time data processing, enhancing the decision-making process by providing timely insights into sales and market trends.Managed HBase databases to ensure efficient data storage and retrieval, which supported high-volume, real-time applications critical to business operations.Utilized Sqoop to efficiently transfer data between Hadoop and structured datastores, significantly reducing the time required for data synchronization.Designed and maintained scalable and reliable data pipelines using Kafka, AWS Data Pipe Lines, and AWS Glue, facilitating seamless data flow across diverse systems.Configured and managed AWS EC2 instances for deploying and scaling applications, optimizing resource allocation based on computational needs.Leveraged AWS S3 for secure and scalable storage solutions, implementing best practices for data lifecycle management and cost optimization.Employed Cloudera management tools to monitor and manage Hadoop clusters, ensuring high availability and performance of data services.Developed applications in Scala using Scala IDE (Eclipse), enhancing functionality and improving performance of data processing tasks.Wrote and maintained scripts using Linux Shell Scripting to automate routine data management tasks, increasing operational efficiency.Designed and implemented data models in HDFS to support scalable and distributed data architecture requirements.Utilized Python for scripting and automation of data processes, improving accuracy and efficiency in data transformation and analysis.Integrated Snowflake cloud data platform to consolidate different data sources for analytical reporting, enhancing business intelligence capabilities.Developed interactive dashboards and reports using QlikView, providing actionable insights into customer behavior and operational performance.Parsed and processed complex data structures using Json, improving the integration of web-based data into enterprise applications.Orchestrated containerized applications using OpenShift, enhancing deployment processes and environment consistency across development, testing, and production.Managed AWS Glacier for long-term data archiving, ensuring compliance with data retention policies and reducing storage costs.Optimized data warehousing solutions using AWS Redshift, enhancing query performance and supporting scalable analytics.Ensured data integrity and security by implementing robust backup and recovery processes, leveraging AWS technologies and best practices.Collaborated with cross-functional teams to translate business requirements into technical implementations, ensuring alignment with strategic objectives.Conducted performance tuning of Hadoop clusters and Spark jobs to meet SLAs and improve resource utilization.Led the migration of legacy systems to AWS platforms, ensuring seamless data integration and minimal downtime.Developed and maintained documentation on data architectures, solutions, and best practices to guide the team and stakeholders.Implemented robust data validation processes using Informatica and Data Flux to ensure data accuracy, completeness, and consistency.Streamlined development workflows, reduced deployment times, and minimized manual errors through automated CI/CD pipelines, ensuring faster delivery of data solutions and applications.Implemented data cataloging and metadata management using AWS Glue Data Catalog to provide a unified view of data assets.Developed custom PySpark functions and transformations to handle specific data transformations and business logic requirements.Automated data pipelines in AWS Glue to process and transform large-scale datasets efficiently, improving data quality and reducing processing time.Optimized PySpark jobs for performance and scalability, leveraging techniques like partitioning, caching, and broadcast variables.Writing Terraform configuration files (HCL) to define infrastructure resources on cloud platforms like AWS, Azure, or Google Cloud.Experience in monitoring, logging, and troubleshooting workflows in Airflow.Integrating Lambda functions with AWS services such as API Gateway, S3, DynamoDB, and SQS.Strong knowledge of version control concepts and best practices using Git.Writing Lambda functions in languages like Python, Node.js, or Java for event-driven architectures. Using frameworks and tools like Flask (Python), Express (Node.js), or Spring Boot (Java) for API developmentEnvironment: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, Cloudera, Scala IDE (Eclipse), Scala, Linux Shell Scripting, HDFS, Python, Snowflake, QlikView, Json, OpenShift, AWS Glacier, AWS Redshift, AWS S3, AWS Data Pipe Lines, AWS Glue.Centene Corporation St Louis, MissouriAzure data engineer November 2019 to January 2022Responsibilities:Utilized Azure Data Factory to automate data movement and transformation, enhancing workflow efficiency across healthcare datasets.Developed scalable data processing pipelines using Azure Databricks, improving data accessibility for analytics teams.Managed Azure Synapse analytics to provide robust data warehousing solutions, supporting large-scale health data analysis.Integrated Hadoop and Hive environments to facilitate the storage and querying of big data, significantly reducing data retrieval times.Designed and maintained Azure Event Hubs for real-time data streaming, optimizing the capture of healthcare operational data.Leveraged Power BI and Tableau for creating interactive dashboards and reports, aiding in decision-making processes.Utilized Azure Databricks for data transformation and processing, optimizing data flows to support real-time analytics and machine learning models.Monitored the performance of the Azure Data Lake environment, identifying and resolving bottlenecks in data processing and storage.Utilized Azure Monitor and Azure Log Analytics to track system performance, set up alerts for potential issues, and systematically optimize queries and data loads for improved efficiency.Administered SQL Server 2017 databases to ensure data accuracy and availability for critical health insurance applications.Employed Map Reduce techniques to process large datasets efficiently, decreasing processing times by 30%.Conducted database optimizations using Database Engine Tuning Advisor and SQL Profiler, enhancing performance and scalability.Collaborated with development teams using Jira and GitHub to track enhancements and manage code repositories effectively.Supported data integration projects by configuring and managing Teradata systems, ensuring reliable data storage solutions.Utilized SQL and T-SQL for complex query development and database management, supporting backend data operations.Orchestrated Tidal for job scheduling, improving the execution of batch jobs and data workflows in a healthcare environment.Enhanced data security measures by implementing robust data governance practices using Hive and Hue.Facilitated team collaboration and document management using SharePoint, improving project communication and documentation.Configured Windows 10 environments to optimize software tools and applications for data engineering tasks.Optimized data transfer between SFDC and enterprise data warehouses, enabling more accurate customer data analysis.Maintained high data quality and integrity by regularly auditing and cleansing data using custom SQL scripts.Provided technical support and training to team members on Azure Databricks and Azure DataFactory, elevating team capabilities.Developed and tested SQL databases performance tuning, ensuring optimal system efficiency with SQL Profiler.Coordinated with IT and business teams to align Azure Synapse solutions with corporate health insurance data strategies.Assisted in the migration of legacy systems to modern Azure platforms, ensuring seamless data transition and minimal downtime.Monitored and troubleshooted data pipeline issues in real-time using Azure Event Hubs, maintaining high availability and performance.Evaluated new technologies and tools to enhance the data architecture framework, focusing on scalability and reliability.Documented all processes and changes in project workflows in Jira and SharePoint, ensuring compliance with IT standards.Led end-to-end data engineering projects, overseeing data lifecycle from acquisition to analysis and reporting, using technologies like Spark, Hive, and AWS services.Collaborated effectively with diverse teams including data scientists, analysts, developers, and business stakeholders to align technical solutions with strategic business objectives.Environment: Hadoop, Hive, Map Reduce, Teradata, SQL, Azure event hubs, Azure synapse, Azure DataLake, Azure DataFactory, Azure Databricks, Hue, Power BI, Tableau, SQL Profiler, Database Engine Tuning Advisor, Jira, GIT Hub, SharePoint, Windows 10, Tidal, SQL Server 2017, Tableau, Power BI, SFDC, SQL, T-SQL, Hive.Homesite insurance, Boston, MAGCP Data Engineer November 2017 To October 2019Responsibilities:Designed and implemented data solutions using GCP services such as BigQuery, Cloud SQL, and Cloud Storage, enhancing data accessibility and reliability.Developed ETL processes using Matillion to transform and load data into Big Query, optimizing data workflows for analytics and reporting.Implemented BQ-ML for developing and deploying machine learning models directly in GCP BigQuery, improving predictive analytics capabilities.Created interactive reports and dashboards using DataStudio and Snowflake, providing real-time insights into risk assessment and claims processing.Managed MySQL databases on Cloud SQL, ensuring high availability and performance for transactional data processing.Utilized Federated Queries in BigQuery to integrate external data sources, enhancing data analysis without moving data across systems.Implemented IAM Security policies to manage access control and ensure data security across GCP platforms.Orchestrated data transfers between different environments using Service Data Transfer, ensuring efficient data synchronization.Developed and maintained Python scripts for data extraction, transformation, and loading tasks, increasing automation and reducing manual errors.Authored shell scripts to automate routine cloud management tasks, improving efficiency and system reliability.Configured VPC networks to secure data transfer between GCP services, ensuring compliance with data governance and privacy standards.Utilized Data Catalog to create a unified data governance framework for metadata management, enhancing data discovery and usability.Leveraged GCP Databricks for large-scale data processing and machine learning, significantly improving data processing capabilities.Managed secure connections using VPN Google-Client for remote data access, ensuring secure and reliable data exchange.Designed and implemented real-time data ingestion pipelines using Pub Sub and Gcs Bucket, facilitating immediate data availability for analysis.Developed robust data storage solutions using GCS Bucket, optimizing data storage costs and performance.Collaborated with business analysts to translate business needs into scalable data solutions, ensuring alignment with business goals.Conducted data quality checks and performance tuning on GCP BigQuery and Cloud SQL instances to ensure optimal performance.Led the integration of legacy systems into GCP, ensuring seamless data migration with minimal disruption.Provided technical leadership and guidance to junior data engineers, fostering a collaborative team environment.Maintained documentation on system architectures, data flows, and best practices, enhancing knowledge sharing and system maintenance.Participated in security audits to assess and improve security frameworks and data protection mechanisms.Engaged in continuous learning about new GCP features and data engineering practices, maintaining cutting-edge knowledge in the field.Assisted in the development of disaster recovery plans, ensuring data integrity and availability in emergency scenarios.Actively contributed to team meetings and project planning sessions, offering insights and solutions to enhance project outcomes.Developed validation rules and scripts to perform quality checks on incoming data, enhancing overall data reliability and trustworthiness.Designed and implemented scalable data pipelines covering data collection, storage, processing, analysis, and visualization, ensuring seamless data flow and integrity.Leveraged clustering methodologies to optimize data processing workflows, improving data understanding and facilitating informed decision-making processes.Environment: GCP, GCP BigQuery, Cloud SQL, Cloud Storage, Matillion, BQ-ML, DataStudio, Snowflake, MySQL, Federated Queries, IAM Security, Service Data Transfer, python, shell scripts, Federated Queries, VPC Configuration, Data Catalog, GCP Databricks, VPN Google-Client, Pub Sub, Gcs Bucket.Creative IT, Hyderabad, IndiaData Analyst July 2014 to August 2017Responsibilities:Utilized Informatica 6.1 to develop and maintain robust ETL processes, significantly enhancing data integration and data warehousing solutions for client projects.Managed data cleansing and normalization using Data Flux, ensuring high data quality and reliability for downstream analytical processing and decision-making.Conducted complex SQL queries on Oracle 9i databases to extract, analyze, and report data, supporting strategic business initiatives and client reporting requirements.Developed and executed PL/SQL scripts in TOAD to automate data tasks, increasing efficiency and reducing manual errors in data manipulation and reports.Leveraged Quality Centre 8.2 to manage and track data validation processes, improving data integrity and consistency across multiple data sources.Designed data validation checks using Informatica to ensure completeness and accuracy of data loads, preventing data corruption and loss.Optimized SQL queries and PL/SQL programs to enhance system performance and response times, contributing to smoother end-user experiences and client satisfaction.Collaborated with IT and business stakeholders to translate business requirements into technical specifications, ensuring accurate data solutions were implemented.Drove efficiency and continuity in data operations through meticulous documentation of processes and ETL workflows, while continuously enhancing data flow efficiency and adaptability in Informatica.Performed root cause analysis on data discrepancies and inconsistencies, employing TOAD to debug and rectify issues swiftly.Developed dashboards and reports using data extracted from Oracle 9i through SQL scripts, providing actionable insights to business users and clients.Managed database schema updates and modifications in Oracle 9i, ensuring structural data changes were in line with evolving business needs.Utilized Flat Files for importing and exporting data, effectively managing data transfer between systems and ensuring data was accurately represented across platforms.Assisted in database tuning and optimization using TOAD, enhancing performance and reducing load times for critical business operations.Implemented data security measures within Oracle databases, ensuring sensitive data was protected according to industry standards and compliance regulations.Facilitated user training sessions on data handling and report generation, empowering users with the tools to extract and analyze data effectively.Coordinated with the quality assurance team using Quality Center, ensuring all data solutions met rigorous testing standards before deployment.Monitored system performance post-implementation of data solutions, using insights to make adjustments and ensure optimal operation.Participated in project planning sessions, providing data-related insights that helped shape project scopes and timelines based on realistic data management capabilities.Contributed to data migration projects, overseeing the accurate transfer of data between legacy systems and new platforms using Informatica and Flat Files.Environment: Informatica 6.1, Data Flux, Oracle 9i, Quality Center8.2, SQL, TOAD, PL/SQL Flat Files. |