| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
PHONE NUMBER AVAILABLEEMAIL AVAILABLEData EngineerPROFESSIONAL SUMMARY: Senior Data Engineer with over 9+ years of experience in developing and maintaining large-scale data architectures and ETL pipelines. Expert in developing robust ETL pipelines using Talend, enhancing data integration and workflow automation across various domains. Proficient with Apache Airflow for orchestrating complex data workflows, ensuring efficient scheduling and monitoring of batch jobs. Skilled in deploying and managing SQL databases such as MySQL and PostgreSQL, ensuring optimal performance and data integrity. Utilized Python extensively to script data processing tasks, automate data flows, and integrate systems efficiently. Developed advanced analytics solutions using Power BI for dynamic dashboarding and insightful business intelligence reporting. Managed large-scale data warehouses with Amazon Redshift, optimizing storage and query performance for faster data retrieval. Designed and implemented data solutions on cloud platforms like AWS, leveraging services such as AWS EMR and S3 for scalable data storage and processing. Engineered real-time data streaming processes using Apache Kafka, facilitating immediate data availability and responsive analytics. Utilized Docker for containerization, ensuring consistent environments and seamless deployments across development and production systems. Applied Terraform for infrastructure as code, automating the provisioning of cloud resources and maintaining state configurations. Integrated machine learning models into data workflows using TensorFlow, enhancing predictive capabilities and decision support systems. Deployed and managed big data technologies such as Apache Hadoop and Spark, handling massive datasets efficiently in distributed environments. Implemented secure data access and identity management using Azure Active Directory, enhancing security and compliance across cloud services. Developed and optimized data pipelines using SSIS, enabling efficient data transformation and loading across enterprise systems. Configured Jenkins for continuous integration and delivery, automating the build and deployment processes to enhance development workflows. Managed version control for data projects using Git, facilitating collaboration and maintaining a robust codebase across team environments. Utilized PySpark for processing diverse file formats in big data environments, enhancing data manipulation and aggregation capabilities. Developed and maintained scalable and secure data solutions with Azure SQL Database and Azure Data Lake Storage, supporting complex data operations. Employed Databricks for unified analytics, executing complex data processing tasks and machine learning workflows in a managed Spark environment. Enhanced operational efficiency and project management using Azure DevOps, streamlining workflows and collaboration in cloud-based environments. Applied Agile methodologies to manage data projects, ensuring adaptability and swift response to changing business requirements. Implemented Azure Cosmos DB for globally distributed database management, optimizing latency and scalability for application-specific data needs. Utilized Azure HDInsight for processing large-scale data, leveraging Hadoop and Spark clusters to handle high-throughput analytics. Configured Azure Data Factory for data movement and transformation, integrating various data stores and services in the cloud. Developed documentation and compliance strategies, ensuring adherence to data governance standards and regulatory requirements across all platforms.TECHNICAL SKILLS: Programming Languages: Python, SQL, T-SQL Big Data Technologies: Apache Hadoop, Apache Spark, AWS EMR, Apache Kafka, PySpark ETL Tools: Talend, Apache Airflow, SSIS Database Management: MySQL, PostgreSQL, Amazon Redshift, Azure SQL Database, Azure Cosmos DB, Snowflake Cloud Platforms: Azure, AWS DevOps Tools: Git, Jenkins, Azure DevOps, Docker, Terraform Data Analysis Tools: Databricks, Power BI, TensorFlow, Tableau, Hive Other Skills: Data warehousing, Data streaming, Agile methodologies, Data security, MS-OfficePROFESSIONAL EXPERIENCE:Client: Markel Corp, Glen Allen, VA Feb 2023 to till dateRole: Senior Data EngineerRoles & Responsibilities: Developed and maintained ETL pipelines using Azure Data Factory, integrating complex data from various sources efficiently. Managed and optimized Azure SQL Databases Using to support high-volume data storage and quick data retrieval for analysis. Implemented data warehousing solutions with Azure Cosmos DB, enhancing data storage capabilities and query performance. Designed and executed data integration solutions using Azure HDInsight, processing large datasets with improved efficiency. Automated real-time data streaming processes with Apache Beam, facilitating immediate data availability for operational decision-making. Configured Azure Active Directory to manage secure access and identity verification, ensuring compliance with data security standards. Integrated TensorFlow into the data processing workflows, developing predictive models to aid strategic business decisions. Utilized Databricks for complex data processing tasks, deploying machine learning models to drive insights and innovations. Ensured high availability and disaster recovery using Docker containers, maintaining consistent service levels during critical operations. Maintained detailed documentation of all data architectures and processes, ensuring compliance with industry regulations and standards. Deployed Talend for data transformation and integration, streamlining data flows and enhancing data quality. Utilized Agile methodologies to manage data engineering projects, ensuring timely delivery and responsiveness to business needs. Conducted thorough data analysis using SQL, Python deriving actionable insights to support business objectives. Facilitated secure data transfers and storage using Azure Data Lake Storage, optimizing data accessibility and integrity. Configured and managed data pipelines using Azure Data Factory, automating data ingestion and transformation processes. Enhanced data retrieval and analytics capabilities by implementing Azure SQL Database, providing robust support for complex queries Using. Optimized data processing and management using Azure HDInsight, handling large volumes of data with high efficiency. Developed and maintained robust data security measures using Azure Key Vault, securing sensitive data and encryption keys. Implemented Azure DevOps for project tracking and management, enhancing team collaboration and workflow efficiency. Utilized Azure Machine Learning to develop and deploy models, improving predictive analytics and business forecasting. Developed data streaming applications with Apache Beam, enabling real-time data processing and analytics. Applied data warehousing techniques using Azure Cosmos DB, improving scalability and performance of business intelligence tools. Managed version control and collaboration using Git, ensuring code integrity and facilitating team development efforts. Executed data migration and integration projects with Talend, enhancing data availability and system interoperability. Applied analytical techniques using Python, SQL to extract, transform, and load data, supporting comprehensive business reporting.Environment: Azure Data Factory, Azure SQL Database, Azure Cosmos DB, Azure HDInsight, Apache Beam, Azure Active Directory, TensorFlow, Databricks, Docker, Talend, Agile methodologies, SQL, Python, Azure Data Lake Storage, Azure Key Vault, Azure DevOps, Azure Machine Learning, Git
Top of FormClient: Vertex Inc, Cerritos, CA May 2021 to Feb 2023Role: Data EngineerRoles & Responsibilities: Engineered data pipelines using Apache Airflow, enhancing automation, monitoring, and execution of complex data workflows. Configured Apache Kafka for real-time data streaming, enabling immediate data availability for urgent healthcare analytics needs. Managed Azure SQL Database and Azure Data Lake Storage (ADLS), ensuring secure, scalable, and efficient data storage. Implemented scalable big data processing solutions with Azure HDInsight, processing large volumes of healthcare data. Automated ETL processes using Databricks, integrating data from disparate sources to streamline healthcare data analytics. Secured sensitive healthcare data using Azure Key Vault, managing encryption keys and secrets efficiently. Developed predictive models using Azure Machine Learning to enhance patient care and operational efficiency. Facilitated seamless data integrations using Stream Sets, improving data reliability and accessibility across healthcare systems. Utilized Azure DevOps for project management, continuous integration, and delivery, ensuring timely updates and releases. Employed Agile methodologies to manage data projects, enhancing team responsiveness to changes in healthcare data requirements. Enhanced data security and compliance by configuring Azure Active Directory for robust identity and access management. Deployed Docker containers to provide a consistent environment for application development and data processing. Used Apache Kafka to handle real-time data feeds, supporting instant analytics and decision-making in healthcare services. Managed and optimized healthcare data workflows using Apache Airflow, ensuring efficient task scheduling and data orchestration. Developed and maintained data models in Azure SQL Database, to supporting complex data analysis and reporting needs. Optimized data ingestion and integration processes using Azure Data Factory, connecting various healthcare information systems. Configured Terraform to manage infrastructure as code, automating the provisioning of cloud resources for data projects. Integrated Stream Sets to facilitate real-time data processing and distribution, enhancing operational agility in healthcare services. Utilized Python,SQL for data manipulation and querying, extracting valuable insights from healthcare datasets. Implemented data backup and recovery solutions in Azure, ensuring data durability and availability in case of system failures. Developed comprehensive documentation for all data processes and systems, adhering to healthcare regulations and standards. Applied data encryption techniques using Azure Key Vault, safeguarding patient data and ensuring compliance with privacy laws. Coordinated with healthcare professionals to tailor data solutions that meet clinical and administrative needs effectively. Enhanced system performance and data processing capabilities using Databricks, facilitating advanced analytics and research. Automated data quality checks and balances using Azure Data Factory, ensuring the accuracy and reliability of healthcare data. Engineered data pipelines using Apache Airflow and Snowflake, enhancing automation, monitoring, and execution of complex data workflows. Automated ETL processes using Databricks and Snowflake, integrating data from disparate sources to streamline healthcare data analytics.Environment: Apache Airflow, Apache Kafka, Azure services (SQL Database, Data Lake Storage, HDInsight, Key Vault, Machine Learning, DevOps, Active Directory), Snowflakes, Databricks, StreamSets, Azure Data Factory, Terraform, Docker, Python, SQLClient: JD Finish line, Indianapolis, IN. Jan 2019 to Apr 2021Role: Big Data EngineerRoles & Responsibilities: Developed big data solutions with Apache Hadoop, managing large datasets across distributed systems to enhance data processing. Utilized Apache Spark on AWS EMR for efficient large-scale data processing, reducing time and improving analytics capabilities. Configured AWS Glue for automated data cataloging and ETL processes, streamlining data integration and preparation workflows. Implemented real-time data streaming using Apache Kafka, enabling dynamic data feeds for immediate analytical processing. Employed Docker to containerize applications, ensuring consistent environments across development, testing, and production stages. Managed infrastructure using Terraform, automating the setup and scaling of cloud resources to meet project demands. Used Hadoop YARN for resource management, optimizing the allocation and utilization of computational resources in big data applications. Integrated AWS S3 for scalable and secure data storage, providing a robust solution for storing and retrieving vast amounts of data. Applied MapReduce programming model in Hadoop to process large data sets with a distributed algorithm on a cluster. Leveraged Apache Spark for complex data transformations and analytics, accelerating data insights within the retail domain. Developed and maintained ETL pipelines using AWS Data Pipeline, automating data movement and transformation processes. Utilized Python and SQL for scripting and data manipulation tasks, enhancing data processing and query execution. Configured Apache Kafka for efficient message brokering and stream processing, enhancing real-time data handling capabilities. Implemented data security measures using AWS IAM (Identity and Access Management) to manage access to AWS resources securely. Employed AWS EMR for managing big data frameworks, facilitating efficient analysis and processing of large datasets. Optimized data storage and retrieval operations using AWS S3, ensuring high availability and data durability. Developed data applications using Docker, simplifying deployment processes and enhancing application portability and scalability. Managed version control with Git, facilitating effective collaboration and source code management across development teams. Orchestrated data workflows with Apache Airflow, automating and monitoring data pipelines to ensure operational efficiency. Analyzed data using Apache Spark and Hadoop, extracting actionable insights to drive business decisions and strategies in the retail sector. Implemented Druid for real-time analytics and HBase for scalable data storage, enhancing query performance and data accessibility.Environment: Apache Hadoop, Apache Spark, AWS EMR, AWS Glue, Apache Kafka, Docker, Terraform, AWS S3, AWS IAM, AWS Data Pipeline, Python Git, Apache Airflow, Druid, HBase and Hadoop YARN.Client: Optimum Info System Pvt Ltd, Chennai, India Jul 2016 to Sep 2018Role: ETL Pipeline DeveloperRoles& Responsibilities: Designed and implemented ETL pipelines using Python and SSIS, enhancing data transformation and loading for business intelligence. Managed large datasets within Amazon Redshift, optimizing data storage and query execution for enhanced analytics performance. Utilized Apache Airflow for orchestrating complex data workflows, automating ETL processes and improving operational efficiency. Configured Jenkins for continuous integration and delivery, automating the build and deployment processes to enhance development workflows. Developed data integration solutions using Git for version control, facilitating effective collaboration and source code management. Processed diverse file formats using PySpark, enabling efficient data manipulation and analysis across different data sources. Enhanced SQL capabilities to optimize data retrieval and reporting, supporting critical decision-making processes. Automated data operations using Python, creating scripts that streamlined workflows and improved data handling efficiency. Implemented robust data pipelines with Amazon Redshift, ensuring scalable and secure data warehousing solutions. Configured Apache Airflow to manage task dependencies and scheduling, enhancing data process automation and reliability. Utilized Jenkins to automate testing and deployment, improving code quality and reducing time to market for data applications. Managed version control using Git, ensuring data integrity and facilitating seamless collaboration across data teams. Developed SQL queries for complex data analysis, providing insights that supported strategic business initiatives. Implemented data quality checks within ETL processes using SSIS, ensuring accuracy and consistency of data loads. Orchestrated data migration projects using Python and SSIS, ensuring seamless data integration and minimal downtime. Leveraged Python to automate and optimize data processing tasks, enhancing productivity and reducing manual efforts. Utilized PySpark for handling large-scale data processing, improving performance and scalability of data operations. Configured and maintained Amazon Redshift databases, optimizing storage and query performance for large datasets.Environment: Python, SSIS, Apache Airflow, Jenkins, Git, PySpark, and Amazon Redshift.Client: Excellent WebWorld, Ahmedabad, India Oct 2014 to Jun 2016Role: SQL DeveloperRoles & Responsibilities: Developed and optimized SQL queries and procedures to manage MySQL and PostgreSQL databases, enhancing database performance. Utilized Talend to integrate various data sources, streamlining data workflows and improving data quality. Created and maintained data pipelines using Python, automating data collection and processing tasks. Designed dashboards using Power BI, providing actionable insights through visual analytics to support business decisions. Configured Amazon Redshift for data warehousing, optimizing data storage and analytics capabilities. Managed version control with Git, enhancing source code management and collaboration among development teams. Automated data processes using Apache Airflow, improving efficiency and reliability of data operations. Developed SQL scripts for data manipulation and reporting, enhancing data accessibility and usability. Implemented Talend for data transformation and integration, ensuring accurate and timely availability of data across systems. Utilized Python for scripting and automation, enhancing operational efficiency and reducing manual errors. Configured and maintained PostgreSQL databases, optimizing performance and ensuring data security. Leveraged Power BI for creating dynamic reports and dashboards, enabling real-time business intelligence. Applied best practices in database management using MySQL, ensuring robust data storage and retrieval. Developed comprehensive documentation for data processes, facilitating knowledge transfer and regulatory compliance. Engineered automated workflows using Apache Airflow, streamlining data tasks and enhancing process reliability.Environment: MySQL, PostgreSQL, Talend, Python, Power BI, Amazon Redshift, Git, Apache AirflowEducation: Bachelor of Technology (B.Tech) in Information Technology from Osmania University, Hyderabad, Telangana, India. - 2014 |