| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
DATA ENGINEERLocation: Harrison, New JerseyMobile no: PHONE NUMBER AVAILABLEGmail Id: EMAIL AVAILABLESUMMARY Seasoned Data Engineer with 4 years of experience architecting and deploying robust, scalable data pipelines and ETL processes for complex big data environments. Proficient in Python, SQL, and Bash scripting for data extraction, transformation, and loading from diverse sources, including relational databases, NoSQL databases, and streaming data. Extensive hands-on experience with Apache Spark, Hadoop ecosystem (HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Kafka), and cloud-based data processing platforms like AWS EMR and Google Dataproc. Skilled in designing and implementing data lakes and data warehouses using technologies such as Apache Hive, Apache Impala, and cloud-based services like Amazon Athena and Google BigQuery. Expertise in building real-time streaming data pipelines using Apache Kafka, Apache Spark Streaming, and AWS Kinesis for ingesting and processing high-volume, low-latency data streams. Proficient in data modeling, schema design, and optimizing data structures for efficient querying and analysis using tools like Apache Parquet, Apache Avro, and Apache ORC. Experienced in automating and orchestrating complex data workflows using Apache Airflow, AWS Data Pipeline, and Google Cloud Dataflow. Adept at monitoring, troubleshooting, and optimizing data pipelines for performance, fault tolerance, and scalability using tools like Apache Spark UI, Hadoop YARN, and AWS CloudWatch. Skilled in implementing data security and governance practices, including data encryption, access control, and auditing, in compliance with industry standards and regulations. Strong collaboration skills with cross-functional teams, including data scientists, analysts, and stakeholders, to understand business requirements and translate them into robust data solutions. SKILLSMethodologies: SDLC, Agile, WaterfallProgramming Language: Python, SQL, Java, RPackages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP) IDEs: Visual Studio Code, PyCharm, Jupyter Notebook, IntelliJ Cloud Platforms: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform Database: MySQL, PostgreSQL, MySQL, MongoDB, T-SQL Data Engineering Concept: Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, ETL/ELT Data warehousing andStorageAmazon Redshift, Google BigQuery, Snowflake, Azure Synapse Analytics and Oracle Exadata Data Modeling andIntegration:ER/Studio, Erwin Data Modeler, dbt (data build tool), Apache Airflow, AWS Glue Other Technical Skills: Data Lake, SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend, Amazon Redshift, Snowflake, Google Big Query, Data Quality and Governance, Machine Learning Algorithms, Natural Language Process, Big Data, Advance Analytics, Statistical Methods, Data Mining, Data Visualization, Data warehousing, Data transformation, Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving Version Control Tools: Git, GitHubOperating Systems: Windows, Linux, Mac iOSEDUCATIONMaster of Science in Computer Science Case Western Reserve University, Cleveland, Ohio Bachelor of Technology in Computer Science GITAM University, Visakhapatnam, Andra Pradesh, India EXPERIENCEData Engineer Johnson & Johnson, New Brunswick, NJ Oct 2023-Present Leading a critical healthcare data integration project at Johnson & Johnson, implementing a HIPAA-compliant cloud-based data lake solution to consolidate patient data from multiple clinical trials and research studies. Developing and maintained ETL pipelines using Python and Apache Spark to process and transform large-scale healthcare datasets, including electronic health records (EHR) and clinical trial data. Designing and implementing a secure, scalable data model using SQL Server and MongoDB to support efficient storage and retrieval of sensitive patient information for advanced analytics and research purposes. Utilizing Azure Data Factory and Azure Databricks to orchestrate and automate HIPAA-compliant data workflows, ensuring seamless integration of various healthcare data sources. Created interactive dashboards and reports using Power BI and Tableau, providing real-time insights on clinical trial progress, patient outcomes, and drug efficacy to research teams and senior management. Collaborating closely with biostatisticians, data scientists, and clinical researchers to translate complex healthcare requirements into technical specifications and data-driven solutions. Implementing rigorous data quality checks and validation processes using Python and SQL to ensure the accuracy and consistency of patient data throughout the analytics pipeline. Developing and executed automated testing scripts using Selenium and Python to validate data integrity and system functionality, focusing on maintaining patient data privacy and security. Optimizing database queries and data processing algorithms, resulting in a 35% improvement in the speed of clinical trial data analysis and reporting. Leveraging Azure cloud computing technologies to scale data processing capabilities for handling large volumes of genomic and proteomic data in personalized medicine research. Conducting regular code reviews and implemented best practices for version control using Git, ensuring high-quality, maintainable, and compliant codebase in line with healthcare industry standards. Data Engineer Capgemini, India May 2020- Jul 2022 Designed and implemented robust data pipelines using Apache Spark and Google Cloud Platform (GCP) for a major banking client's credit risk assessment project Developed ETL processes to extract financial data from multiple sources, including transaction systems, credit bureaus, and internal databases Optimized data processing workflows for credit scoring models, reducing execution time by 30% and improving risk assessment turnaround Implemented data quality checks using SQL and custom Python scripts to ensure accuracy of financial metrics and customer information Collaborated with risk analysts and data scientists to translate complex credit risk algorithms into efficient data transformations Utilized GCP services including BigQuery, Cloud Storage, and Dataflow to build a scalable credit risk data warehouse Created comprehensive documentation for the credit risk data model, ETL processes, and data lineage to ensure regulatory compliance Implemented stringent data security measures to protect sensitive customer financial information, adhering to banking industry regulations Developed real-time dashboards using Data Studio to visualize key credit risk indicators for executive stakeholders Participated in agile sprints, collaborating closely with the credit risk modeling team to iteratively improve data pipelines Mentored junior data engineers on financial data handling best practices and GCP-based ETL development Stayed current with financial technology trends, attending fintech workshops focused on blockchain and AI in credit risk assessment |