| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
DATA ENGINEEREmail: EMAIL AVAILABLE PHONE NUMBER AVAILABLE KSSUMMARY4+ years of experience in data engineering, focusing on designing and managing scalable data pipelines, optimizing ETL processes, and working with large datasets in healthcare and finance sectors.Proficient in using Python, SQL, PySpark, Spark SQL, Azure Databricks, Apache Airflow, Azure Data Factory, Snowflake, and Kafka for data engineering tasks.Experienced in cloud platforms like Microsoft Azure and AWS, with practical skills in real-time data streaming and processing, as well as managing data lake architectures.Skilled in developing ETL pipelines, implementing data lake solutions, and utilizing data visualization tools such as Power BI and Tableau to generate insights and support decision-making.Competent in working with databases like SQL Server and Snowflake for data management, including data cleaning, validation, and optimization.EDUCATIONMaster of science in computer scienceWichita State University, Wichita, KSBachelor in Electronics and Communication Engineering JNTU College, Andhra Pradesh, IndiaTECHNICAL SKILLSBig Data Technologies: Hadoop, Spark, Hive, Kafka, Snowflake, Airflow Cloud Platforms: Azure, AWSLanguages: Python, SQL, Java8, Shell ScriptingData warehousing: Snowflake, Azure Data WarehouseFrameworks: React JS, Spring Boot, Microservices API Databases: MySQL, CosmosDB, MongoDB, PostgreSQLVisualization Tools: Tableau, Power BIMethodologies: SDLC, Agile, WaterfallTools and others: JIRA, Git, Docker, Kubernetes, AWS Glue, AWS S3 WORK EXPERIENCECVS Health, USA Data Engineer Jan 2024 CurrentWorked extensively with Agile development methodologies, managing application iterations for efficient project delivery.Developed and optimized PySpark Data Frames in Azure Databricks to process and transform data from Data Lake or Blob storage.Created and deployed high-performance ETL pipelines using PySpark and Azure Data Factory, enhancing data processing efficiency.Utilized Azure Event Hubs and Apache Kafka for real-time data streaming, increasing data freshness by 30%.Optimized ETL pipelines and data workflows, achieving over a 25% reduction in processing time.Instituted a data lake architecture with CosmosDB, improving data storage and query efficiency, reducing management time by 35%.Managed data drift and schema evolution challenges with StreamSets, ensuring seamless data integration.Scheduled and orchestrated ETL processes using Apache Airflow, creating, and managing Directed Acyclic Graphs (DAGs).Collaborated with Quality Engineering teams to design and execute comprehensive testing strategies, ensuring data accuracy and reliability.Ensured data privacy and compliance with HIPAA guidelines, implementing best practices for data anonymization and access control.Led the implementation and optimization of data workflows in Palantir Foundry, improving data processing efficiency and accuracy.Developed interactive dashboards and visualizations with Tableau, providing actionable insights and enhancing decision-making processes.Vaayuja info Solution, India Data Engineer May 2019 Aug 2022Engaged in the analysis, design, and development phases of the Software Development Lifecycle (SDLC) within an agile environment, utilizing JIRA and GitHub for project management and version control.Developed cloud-based data pipelines and Spark applications on AWS, using AWS S3 for data staging and Redshift for data migration.Designed and implemented end-to-end data pipelines with StreamSets for efficient data ingestion and transformation.Employed Spark Streaming for preprocessing streaming data and developed Spark applications for data validation, cleansing, transformation, and custom aggregation. Utilized Spark SQL for in-depth data analysis.Developed REST APIs using Python with Flask and Django frameworks to integrate data from various sources.Implemented and maintained Apache Airflow DAGs for orchestrating ETL processes, leading to streamlined and automated data pipelines.Optimized big data workflows using Hadoop, including MapReduce and HDFS, for efficient data processing and storage. |