| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateSENIOR DATA ENGINEERCandidate's Name
Phone: PHONE NUMBER AVAILABLEEmail: EMAIL AVAILABLELinkedIn: LINKEDIN LINK AVAILABLEPROFESIONAL SUMMARY:Seasoned Data Engineer with over 10 years of experience specializing in big data and cloud solutions across AWS, Azure, and GCP platforms.Proficient in AWS services including EC2, S3, Redshift, Glue, and various AWS data pipelines, enhancing data storage, processing, and analytics capabilities.Skilled in deploying and managing cloud infrastructure using Terraform and CloudFormation, ensuring scalable, efficient, and secure environments.Expertise in Azure Data Solutions like Azure Data Factory, Azure Data Lake, and Azure SQL, facilitating robust data integration and analytics pipelines.Advanced proficiency in GCP tools such as BigQuery, DataProc, Dataflow, and GCP Databricks, optimizing data processes and storage solutions.Developed complex ETL pipelines using Apache Spark, Hadoop, and Cloudera; experienced in processing large datasets using Spark SQL and Spark Streaming.Hands-on experience with stream processing technologies like Kafka and managing real-time data feeds that support critical business functions.Strong coding skills in Python, Shell Scripting, and PowerShell, enabling automation and scripting of complex data processes.Utilized SQL, Hive, and Pig for data querying and manipulation, significantly improving data retrieval processes and analytics.Proficient in DevOps practices and tools such as Git, Jira, Maven, and Azure DevOps, enhancing code quality and deployment efficiency.Implemented comprehensive monitoring solutions using ELK Stack, CloudWatch, and Splunk, ensuring high availability and performance of data applications.Experience with data warehousing solutions like AWS Redshift and Snowflake, providing scalable and cost-effective storage options.Expert in managing database technologies such as DynamoDB, Oracle, and Azure SQL, ensuring data integrity and performance.Skilled in data visualization and BI tools like Power BI and MicroStrategy, transforming data into actionable insights and reports.Demonstrated ability in data security and compliance, ensuring adherence to data governance and security standards across multiple platforms.Strong background in continuous integration and delivery (CI/CD) pipelines using AWS CodeDeploy, CodePipeline, and azure pipelines, streamlining development cycles and reducing go-to-market time.TECHNICAL SKILLS:CATEGORYSKILLSAWSEC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, Cloud Formation, CloudWatch, ELK Stack, DynamoDB, Kinesis, Redshift, Lambda, Data Pipelines, Glue, CodeDeploy, CodePipeline, CodeBuild, CodeCommitAZUREAzure Data Factory, Azure Data Lake, Azure Function App, Azure WebApp, Azure SQL, Azure SQL MI, Cosmos DB, BlobGOOGLE CLOUD (GCP)GCP DataProc, BigQuery, GCS, DataFlow, Dataprep, Dataproc, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP DatabricksDevOps TOOLSBitbucket, Ansible, Terraform, GIT, Jira, Azure DevOps, Maven, Splunk, SonarQube, JenkinsPROGRAMMINGPython, Shell Scripting, PowerShell, Pyspark, SASBIG DATA TECHNOLOGIESHadoop, Spark, Hive, Pig, HBase, Spark SQL, Spark Streaming, Sqoop, Cloudera, MapReduce, HDFS, Oozie, Zookeeper, Flume,KafkaDATA BASE MANAGEMENTSQL, Oracle, Informatica, Teradata, SnowflakeWEB TECHNOLOGIESJBOSS, Web sphere, WebLogic, SSH, YAMLBI TOOLSPower BI, MicroStrategy, TableauCERTIFICATIONS:Azure Data Engineer-- Associatehttps://learn.microsoft.com/api/credentials/share/en-us/Candidate's Name -0133/AF521D6EF538906?sharingId=A460008D761A19A9AWS Data Engineer Associatehttps://www.credly.com/badges/bb460b1f-169c-401b-8d64-ceec05b29c00/public_urlPROFESSIONAL EXPERIENCE:SENIOR DATA ENGINEER Mayo Clinic Rochester MN November 2021 to PresentResponsibilities:Developed and maintained scalable data pipelines and built new API integrations to support continuous ingestion and extraction using AWS Data Pipelines and AWS Glue.Engineered and optimized data warehouse solutions using AWS Redshift, enhancing data retrieval efficiency for medical research purposes.Implemented complex data transformations and aggregations to support clinical research and decision-making processes.Designed and implemented serverless applications using AWS Lambda, reducing operational costs and improving scalability for healthcare applications.Developed and optimized data processing pipelines using Scala and Apache Spark, enhancing the performance and scalability of healthcare data analytics.Designed and deployed machine learning models on AWS to predict patient outcomes, utilizing Spark MLlib and Hive for data processing.Built real-time data processing pipelines with AWS Lambda, S3, and Kinesis, enhancing the efficiency of patient data analytics.Automated healthcare workflows, including appointment scheduling and medical billing, through Lambda functions and AWS Step FunctionsUtilized AWS Kinesis for real-time data streaming capabilities, facilitating immediate data processing and analysis in healthcare studies.Orchestrated data flow between systems using Apache Kafka and Sqoop, ensuring efficient data synchronization and communication.Enhanced data collection and storage strategies by integrating DynamoDB and Cloudera with existing AWS solutions.Automated deployment processes using AWS CodeDeploy, CodePipeline, and CodeBuild to streamline code updates and minimize downtime.Applied Terraform scripts for infrastructure as code deployments, promoting consistency across development, testing, and production environments.Scripted automation tools with Python, Shell Scripting, and PowerShell to reduce manual tasks and enhance system efficiency.Automated code quality checks using SonarQube to maintain high standards of code reliability and maintainability.Managed and monitored server infrastructure and databases in the cloud using AWS EC2, S3, RDS, and CloudWatch, ensuring high availability and performance.Configured and maintained AWS VPC, EBS, ELB, and SQS services to meet security and networking requirements of the health data systems.Implemented security measures and compliance standards using AWS SNS and IAM policies to protect sensitive patient data.Monitored application logs and performance metrics using Splunk and the ELK Stack to identify and resolve potential issues promptly.Implemented comprehensive logging and error-handling mechanisms within data applications using AWS CloudWatch and ELK Stack.Designed disaster recovery plans using AWS CloudFormation templates to ensure robust data backup and restore systems.Conducted detailed data analysis using Spark SQL and Spark Streaming to support clinical decisions and administrative strategies.Developed Pig scripts to cleanse and preprocess data, making it ready for analysis and reporting in health research.Managed various data formats including JSON, XML, and CSV to facilitate seamless data integration and interoperability within healthcare systems.Leveraged Bitbucket for source control management and collaborated with the development team through GIT workflows.Managed project tasks and timelines using Jira, coordinating with cross-functional teams to ensure timely delivery of data engineering projects.Utilized deep knowledge of SAP tables to integrate SAP systems with AWS services, enhancing data accessibility and analytics capabilities.Developed efficient data extraction and transformation processes using ABAP CDS views, optimizing performance and scalability for healthcare operations.Collaborated with system architects to design data solutions that align with Mayo Clinics strategic goals, employing a mix of AWS services and open-source technologies like Maven.Administered JBOSS and WebSphere application servers in a Unix/Linux environment, maintaining operational stability.Enhanced the data analytics framework by designing and implementing new features using PySpark.Led scalable job development to handle large volumes of patient records, clinical data, and medical images.Created AWS Quick Sight dashboards to visualize patient outcomes.Integrated new data sources, including real-time streaming data, and developed data quality checks and validation routines to ensure accuracy and reliability.Demonstrated expertise in securely managing credentials and other sensitive information using AWS Secrets Manager.Implemented robust security measures and access controls to protect sensitive data, ensuring compliance with healthcare regulations.Supported and managed HBase databases, optimizing storage and retrieval processes to handle large volumes of data.Secured data transfers and connectivity within cloud environments using AWS VPC and SQS, ensuring compliance with health data regulations.Implemented robust data ingestion pipelines utilizing message queues such as Tibco, ensuring efficient data flow and real-time processing capabilities.Environment: AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, CloudFormation, CloudWatch, ELK Stack, Data Pipelines, AWS Redshift, AWS Glue, AWS Lambda), PySpark, Bitbucket, Ansible, Python, Shell Scripting, PowerShell, GIT, Jira, JBOSS, Terraform, Redshift, Maven, WebSphere, Unix/Linux, DynamoDB, Kinesis, CodeDeploy, CodePipeline, CodeBuild, CodeCommit, Splunk, SonarQube, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, Cloudera.SENIOR DATA ENGINEER Chevron Corporation, Santa Rosa, NM August 2019 to October 2021 Responsibilities:Developed and managed Big Data workflows using GCP Dataflow and GCP Dataproc to process large datasets from oil exploration activities.Designed and implemented data storage solutions using GCP Cloud Storage and BigQuery, optimizing data accessibility and query performance.Automated data ingestion pipelines using Sqoop and Cloud Pub/Sub to streamline the transfer and availability of seismic data across global platforms.Configured and maintained Cloud SQL and Cloud Spanner databases to support high-throughput transactions and real-time data analysis.Leveraged Pyspark within GCP Databricks to conduct complex data transformations and analytics on upstream oil production data.Created data models and schemas in BigQuery to support deep analytics and generate insights on refinery efficiency.Leveraged Cognite Data Fusion to integrate, contextualize, and analyze industrial data from multiple sources, optimizing data-driven decision-making for oil and gas operations.Implemented Cognite Data Fusion for real-time data ingestion and synchronization, ensuring seamless data flow and operational efficiency.Utilized GCP Dataprep to clean and preprocess datasets, ensuring data quality and consistency for downstream processing.Orchestrated multi-source data integration strategies using Hive and Hadoop, aligning with Chevrons data governance standards.Developed scalable and secure data solutions with GCP services to support Chevrons initiatives in biofuels and alternative energy technologies.Engineered scalable data ingestion solutions leveraging IBM MQ to handle diverse data formats like JSON, XML, and CSV.Optimized data ingestion processes to enhance data availability and consistency across multiple enterprise applications in the energy sector.Designed and implemented new data processing features in the oil and gas data analytics framework using PySpark on GCP.Developed scalable PySpark jobs to handle increasing volumes of exploration and production data.Analyzed seismic data, drilling reports, and production logs, and built interactive dashboards in Google Data Studio.Integrated new data sources, including real-time sensor data, and implemented data quality checks and validation routines on GCP.Employed Python scripts to automate data extraction, transformation, and loading processes, reducing manual effort and improving cycle times.Monitored and optimized data pipelines using GCPs Cloud Composer to ensure efficient execution of scheduled tasks.Built interactive dashboards and reports using Power BI and GCPs BigQuery to provide actionable insights into operational data.Optimized data processing costs and efficiency by implementing advanced querying techniques in BigQuery and SQL Database.Ensured data security and compliance with industry regulations by implementing robust controls and monitoring mechanisms in GCP.Conducted regular system audits to ensure data integrity and accuracy within GCP environments.Implemented disaster recovery and business continuity strategies using GCPs Cloud Storage Transfer Service.Collaborated with IT and business teams to define and achieve data analytics goals that support Chevrons strategic business initiatives.Provided technical leadership and guidance in the adoption of new GCP technologies for advancing Chevrons data-driven decision making.Trained non-technical team members on best practices for data handling and visualization on GCP.Analyzed and improved existing data architectures, proposing enhancements to increase system scalability and performance on GCP.Designed and executed data migration strategies from on-premises systems to GCP, ensuring minimal disruption to ongoing operations.Facilitated data sharing across different business units by utilizing GCPs Data Catalog to document and catalog data assets.Integrated Snowflake with GCP for enhanced data warehousing capabilities, enabling more complex queries and reports.Evaluated and recommended improvements to Chevrons existing data workflows by benchmarking against industry standards and leveraging GCP solutions.Environment: GCP (Dataflow, Dataproc, BigQuery, Dataprep, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, Databricks, Cloud Storage) Pyspark, SAS, Hive, Sqoop, Teradata, Hadoop, Python, Cognite Data Fusion, Snowflake, Power BI, SQL Database, IBM MQ.DATA ENGINEER UBS Weehawken, NJ September 2017 to July 2019Responsibilities:Designed and implemented data integration workflows using Azure Data Factory (ADF) to facilitate seamless data movement and transformation across various banking services.Configured and maintained Azure Data Lake storage solutions to optimize data accessibility and scalability for financial analytics.Utilized Databricks to perform complex data processing tasks, enabling real-time analytics on financial data.Developed scalable pipelines in Azure for data ingestion and distribution using Sqoop, Flume, and Blob Storage.Managed and optimized Cosmos DB for real-time data access and high-throughput applications to support critical banking operations.Leveraged Apache Kafka and Azure Event Hubs for stream-processing of transaction data to detect patterns and anomalies.Implemented Azure SQL Database and Azure SQL Managed Instance for structured data storage and complex query execution.Automated deployment processes using Azure DevOps, enhancing collaboration and reducing delivery times of data-driven applications.Utilized Power BI to create insightful visualizations and dashboards that help stakeholders make informed decisions.Developed Python scripts to automate data transformation and cleanup processes, improving data quality and processing time.Ensured data security and compliance with financial regulations by implementing robust security measures in Azure environments.Administered Azure Web Apps and Function Apps to host and manage microservices that interact with backend data systems.Engineered large-scale data processing workflows using Scala and Spark, enabling efficient analysis of financial data.Improved data processing speed and accuracy, facilitating real-time analytics and reporting for various financial services.Orchestrated data migrations from legacy systems to Azure, ensuring accuracy and minimal downtime.Monitored and tuned Hive and HBase applications within Azure to optimize performance and resource utilization.Utilized Cloudera on Azure to manage large-scale data processing tasks involving HDFS and MapReduce.Configured Azure Resource Manager (ARM) templates and YAML configurations to automate infrastructure provisioning and management.Collaborated with IT and business units to translate business needs into scalable data solutions on Azure.Implemented Maven and Git workflows for version control and continuous integration of data processing jobs.Managed user access and security settings within Azure SQL and data services to protect sensitive financial information.Used SSH for secure data transfers and remote management of Azure resources.Performed capacity planning and cost management to optimize financial expenditure on Azure services.Supported data retrieval and reporting needs by maintaining robust SQL Server databases and integration with Azure.Automated alerts and monitoring using Azure Monitor to ensure high availability and performance of data services.Participated in Agile development cycles, collaborating with teams through Jira to manage tasks and sprints efficiently.Evaluated and integrated new Azure technologies and updates to enhance the data architecture and support future business needs.Environment: Azure Data Factory, Databricks and Azure Data Lake Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, ACR, Azure Function App, Azure WebApp, Azure SQL, and Azure SQL MI, SSH, YAML, WebLogic, Python, Azure DevOps, Git, Maven, Jira, Apache Kafka, Azure, Python, power BI, Unix, SQL Server.HADOOP DEVELOPER OJAS Innovative Technologies July 2014 to April 2017Responsibilities:Designed and implemented Hadoop deployment strategies, including backup and recovery systems, adhering to architectural standards.Initiated and conducted proof-of-concept (POC) studies for the Apache Hadoop framework to evaluate its effectiveness for various applications.Performed multiple POCs to determine the alignment of Big Data solutions with specific business objectives.Configured, managed, and monitored Hadoop clusters using Cloudera Manager and Ambari, ensuring optimal performance and high availability.Developed and implemented MapReduce jobs for diverse tasks including log analysis, recommendations, and data analytics.Engineered MapReduce jobs to compile daily activity reports by aggregating data from various sources and storing results in HDFS.Evaluated HDFS usage and system architecture to ensure scalability and fault tolerance, installing and configuring essential Hadoop components like HDFS, MapReduce, Pig, Hive, and Sqoop.Authored Pig scripts to automate MapReduce jobs and conducted ETL operations on HDFS data.Processed HDFS data to create external Hive tables, facilitating the analysis of daily visitor metrics, page views, and top-selling products.Utilized Sqoop to transfer analyzed data back to HDFS for enhanced report generation.Employed MapReduce and Sqoop for the efficient loading, aggregation, storage, and analysis of web log data across multiple servers.Crafted Hive queries to support data analysts in their exploratory data analysis.Optimized MapReduce algorithms using combiners and partitions, improving performance and managing application optimization for an HDFS/Cassandra cluster.Executed black box testing for a web-based application interfacing with a mainframe system.Designed and implemented Hadoop workflows using Hive, Pig, and MapReduce to process large data volumes, reducing processing time by 30% and enhancing system performance.Developed and maintained scalable ETL pipelines with Apache Hadoop, ensuring seamless data integration from databases, flat files, and real-time streams, enhancing data accessibility and reliability.Designed and implemented job scheduling workflows using Apache Oozie, automating data processing tasks and ensuring efficient execution of ETL processes.Environment: MapReduce, Hive, Pig, Sqoop, Oracle, Informatica, MicroStrategy, Cloudera, Hadoop scheduler, Manager, Oozie, Zookeeper, Hadoop administration.EDUCATION:MALLA REDDY Engineering College June 2010 May 2014 Bachelors, CSE |