| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
DATA ENGINEER
Baltimore, USA | PHONE NUMBER AVAILABLE | EMAIL AVAILABLE | LinkedIn
SUMMARY
Data Engineer with almost 4 years of experience in designing, implementing, and optimizing large-scale data solutions
across healthcare and financial sectors.
Proven expertise in cloud-based data architecture, particularly with Microsoft Azure services including Azure Data
Factory, Databricks, and Stream Analytics.
Strong background in data visualization and reporting using tools like Tableau and Power BI to drive business insights
and support decision-making.
Strong proficiency in big data technologies, including Apache Spark, Kafka, and Hadoop, with experience processing
and analyzing datasets exceeding 200+ TB.
Skilled in data warehouse design and optimization, utilizing dimensional modeling techniques, star/snowflake schemas,
and slowly changing dimensions (SCD) implementations.
Experienced in developing robust ETL/ELT pipelines, integrating diverse data sources and APIs, and ensuring data
quality and governance.
Proficient in multiple programming languages and tools, including Python, SQL, PySpark, and dbt, for efficient data
transformation and analysis.
Adept at collaborating with cross-functional teams, documenting data processes, and driving internal improvements to
enhance overall data management efficiency.
EXPERIENCE
Data Engineer | Optum, MD Aug 2023 Present
Designed and implemented scalable ETL processes that leveraged Apache Spark for distributed data processing and
Python for data transformation, reducing data processing times by 40% and increasing overall system efficiency.
Extracted and consolidated healthcare data from diverse sources like Electronic Health Record (EHR) systems and
claims databases, ensuring efficient storage and retrieval in AWS S3 for further analysis and reporting.
Designed and developed visually compelling, interactive dashboards in Tableau to enable stakeholders to monitor key
performance indicators (KPIs) in real-time, driving data-driven decision-making.
Implemented and fine-tuned machine learning models for patient risk stratification, increasing the predictive accuracy
by 20%, which improved decision-making in clinical and operational contexts.
Developed a star schema with a central fact table for patient events (claims, medical visits) and multiple dimension
tables (patient demographics, provider information) to streamline reporting and analysis.
Identified key entities such as Patients, Healthcare Providers, Claims, EHR Records, Facilities, and Medications,
ensuring that core healthcare data is well-represented in the ER diagram.
Utilized MapReduce to process large-scale healthcare datasets (exceeding 10TB) from Electronic Health Records
(EHR), claims, and patient data efficiently, enabling distributed parallel processing across multiple nodes.
Applied role-based security controls in the data warehouse to ensure sensitive healthcare data was accessible only to
authorized users, adhering to HIPAA regulations for data privacy and protection.
Worked alongside data science teams to deploy predictive models using Scikit-learn and TensorFlow, enhancing the
platform s capabilities for predictive analytics and improving risk stratification for patients.
Conducted daily stand-up meetings to track progress, address blockers, and align on priorities, ensuring the team
stayed on track with healthcare data processing and reporting tasks.
Data Engineer | Zensar Technology, India Dec 2019 - Aug 2022
Implemented real-time data streaming solutions using Kafka to enhance transaction monitoring and fraud detection
capabilities which reduced risk exposure by 25% by providing up-to-date insights and alerts.
Leveraged MongoDB to store unstructured and semi-structured financial data, such as transaction logs and audit trails,
allowing flexible schema designs and scalable storage solutions for large volumes of data.
Integrated Hive into ETL workflows to perform batch data processing and transformation tasks, enabling efficient
extraction, transformation, and loading of financial data from various sources into a centralized repository.
Applied query optimization techniques in Snowflake, including result caching and automatic clustering, to enhance
query performance and reduce execution times for complex financial queries.
Implemented data quality checks and cleansing procedures to ensure the accuracy and consistency of financial data
before it was loaded into the data warehouse, enhancing data integrity.
Developed and optimized ETL pipelines using PySpark in Azure Databricks, automating the extraction,
transformation, and loading of financial data into the data warehouse, and reducing ETL processing time by 30%.
Automated regulatory reporting processes with Apache Airflow, creating reliable workflows that ensured timely and
accurate compliance reporting. This automation cut down manual reporting efforts by 40% and minimized errors.
Used Hadoop Distributed File System (HDFS) to store and manage large volumes of financial data across a distributed
cluster, providing scalable and reliable storage solutions.
Employed Python for data transformation and cleaning tasks, leveraging Pandas to handle large datasets, remove
duplicates, fill missing values, and perform data normalization to ensure data quality.
Utilized advanced SQL joins and aggregation functions to combine and analyze data from multiple financial tables,
such as calculating total transaction volumes and analyzing spending patterns.
Leveraged Azure Blob Storage for storing unstructured data, such as financial reports and transaction logs, providing
scalable and cost-effective storage solutions.
Developed predictive models to forecast financial metrics such as revenue trends, credit risk, and market movements,
using machine learning algorithms to enhance decision-making and strategic planning.
Used Excel for in-depth data analysis and financial reporting, leveraging features like pivot tables and charts to
summarize and visualize financial data effectively.
SKILLS
Methodologies: SDLC, Agile, Waterfall
Programming Language: Python, SQL, R
Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn
Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP), Quick Sight
IDEs: Visual Studio Code, PyCharm, Jupyter Notebook, IntelliJ
Database: MySQL, PL/SQL, MSSQL, PostgreSQL, MongoDB, SQL Server
Data Engineering Concept: Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, ETL/ELT, PySQL, PySpark
Cloud Platforms: Microsoft Azure (Azure Blobs, Databricks, Data Lake ), Amazon Web Services (AWS)
Other Technical Skills: SSIS, SSRS, SSAS, Maven, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend,
Snowflake, Google Big Query, Data Quality and Governance, Machine Learning
Algorithms, Natural Language Processing, Big Data, Advance Analytics, Statistical
Methods, Data Mining, Data Visualization, Data warehousing, Data transformation,
Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving
Version Control Tools: Git, GitHub
Operating Systems: Windows, Linux, Mac OS
EDUCATION
Masters in Professional Studies in Data Sciences University of Maryland, Baltimore County, Maryland, USA
Bachelor of Technology in EEE Chaitanya Bharathi Institute of Technology, Gandipet, Hyderabad, Telangana, India
|