| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Name: Candidate's Name KEmail: EMAIL AVAILABLEPhone: PHONE NUMBER AVAILABLEhttps://LINKEDIN LINK AVAILABLEData EngineerPROFESSIONAL SUMMARY: Total 10+ years of IT experience in Data warehousing, Data engineering, Data Integration, big data, ETL/ELT, Microservices and Business Intelligence (BI). As a big data architect and engineer, specialize in AWS, Azure frameworks, GCP, Cloudera, Hadoop Ecosystem, Spark/PySpark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, and Data DevOps Frameworks/Pipelines with Scripting skills in Python in building data solutions. Expertise spans across AWS (EMR, EKS, Step Functions, Glue, Lambda), and Azure Data Platform services (Datalake, ADF, Databricks), with strong proficiency in Python, Scala, Java and SQL for data processing and warehousing using Oracle, MySQL, PostgreSQL, DynamoDB and NoSQL (MongoDB, Cassandra). Extensive experience in Big Data Analytics using tools such as Hadoop architecture (HDFS, Yarn, MapReduce, Pig, Hive, HBase, Spark, Drill, Flume, Oozie, Sqoop), with a focus on Spark integration with Cassandra, Zookeeper, and Kafka for real-time data processing. Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop. Deep understanding of Hadoop architecture, including hands-on experience with components like Job Tracker, Task Tracker, Name Node and Data Node. Expertise in managing distributed computing architectures and optimizing data storage solutions. Over 7 years of experience in building and optimizing ETL systems using a variety of tools and in-memory computing frameworks, including Apache Spark, Apache Flink, Redis, Talend, and Kafka.
Expertise in scheduling and maintaining data pipelines with Apache Airflow, Oozie, and NiFi, ensuring efficient data ingestion, transformation, and migration across cloud and on-premises environments with tools like Hive, Sqoop, and AWS Glue. Experienced with Azure transformation projects and implement ETL and data movement solutions using Azure Data Factory (ADF), SSIS. Implemented various components in OLAP systems using different ETL tools like OWB, Wherescape RED and Pentaho, Informatica PWC. Skilled in Text Analytics and Data Visualization using R, Python, Tableau, DAX, Platfora, SSRS and PowerBI. Designed and implemented dashboards for actionable insights and strategic decision-making. Proficient in Snowflake Schema design, normalizing dimension tables and creating sub-dimensions for optimized data querying and reporting. Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table. Extensive experience with AWS Cloud Services, including EC2, S3, EMR, Lambda, CloudWatch, RDS, Terraform, SQS, ECS, EFS, AMI, DynamoDB, Quicksight and Glue. Developed serverless architectures and automated cloud infrastructure management using Terraform and AWS CloudFormation. Extensive experience developing a green field app large app using AWS Cognito, Lambda, API gateway, node backend, Postgres and React/Redux front end. Hands-on expertise in DevOps practices, including CI/CD pipeline development using Jenkins, Ansible, Docker and Kubernetes. Managed code versioning with GitHub, Bitbucket, and SVT, and automated build processes using Maven, Gradle, and SBT. Proficient in Java and J2EE technologies, with experience in Core Java, Servlets, JSP, EJB, JDBC, Spring, Struts, and Hibernate. Developed RESTful Webservices and integrated them with modern data processing frameworks. Extensive hands-on experience with diverse data formats, including JSON, XML, CSV, ORC, Parquet, and Avro. Developed and optimized scalable data processing pipelines using Kinesis, Spark SQL, Spark Streaming, Kafka, and Python, ensuring efficient handling of large datasets in real-time and batch processing environments. Agile Collaboration: Actively participated in Agile ceremonies, including daily stand-ups, sprint planning, and retrospectives, where strong problem-solving skills and teamwork abilities were utilized to address challenges, enhance team collaboration, drive continuous learning, continuous delivery and improvement. Presentations and Reporting: Delivered clear and effective presentations to both technical and non-technical audiences, using excellent communication skills to articulate problem-solving approaches, project progress, and solutions, ensuring transparency, team alignment, and stakeholder buy-in.TECHNICAL SKILLS:Programming & Scripting LanguagesPython, Scala, Java, SQL, T SQL, Shell Script, Bash, UnixBig Data ToolsHadoop, HDFS, Hive, Apache Spark, Pyspark, HBase, Kafka, Beam, YARN, Sqoop, Impala, Oozie, Pig, Map Reduce, Talend, Zookeeper and FlumeCloud ServicesAWS: EC2, S3, EMR, SNS, RDS, Step Functions, Glue, Kinesis, Lambda, Redshift, Quicksight.
Azure: Synapse Analytics, SQL Azure, Data Factory, Insights, Monitoring, Data Lake, HDInsight,GCP: GCS, GCE, Cloud Storage, Dataproc, BigQuery, Pubsub, Dataflow, Cloud Functions.Relational DatabasesOracle, SQL Server, MySQL, PostgreSQL, DynamoDBNoSQL DatabasesCassandra, MongoDB, HBaseData Modeling ToolsErwin Data Modeler, ER StudioOperating SystemsUnix, Linux, Windows, Mac OSSDLCAgile, Scrum, WaterfallETL and Data VisualizationsInformatica, Talend, Tableau, SSRS and Power BIPython LibrariesPandas, NumPy, SciPy, MatplotlibBuild ToolsApache Maven, Gitlab, Jenkins, BitbucketContainer/Cluster ManagersDocker, Kubernetes, EKS, AKS, BazelVersion ControlGIT, GitLab and SVNHadoop DistributionsCloudera, AWS EMR and Azure Data Factory.FrameworksDjango, Flask, WebApp2, Spring boot, NodeJS.PROFESSIONAL EXPERIENCE:
Client: UnitedHealth Group, Phoenix, AZ November 2022-Till dateSenior Data Engineer Conducted data analysis and ETL processes using PySpark, Python, and SQL (PostgreSQL, MSSQL) to extract, transform, and load data from various sources into AWS S3 and data warehouses like Redshift and Snowflake. Developed and maintained scalable data pipelines using PySpark on AWS EMR, ensuring efficient processing of large healthcare datasets. Automated ETL workflows with AWS Glue to enhance pipeline reliability and reduce manual effort, integrating with Lambda for event-driven processes. Designed incremental, historical, and Change Data Capture (CDC) data loading strategies on AWS Databricks and Redshift, ensuring up-to-date and consistent data in the data warehouses. Leveraged AWS Kinesis and Kafka for real-time streaming, ensuring high-throughput ingestion with minimal latency from various healthcare sources. Utilized SNS notifications to alert teams about job execution status and pipeline exceptions for proactive issue resolution. Orchestrated complex workflows using Apache Airflow and Step Functions, managing the end-to-end execution and monitoring of ETL jobs on Databricks, Snowflake, and EMR. Implemented Data Build Tool (DBT) for transformation workflows, enabling modularization, version control via Git, and automation of RDS-based SQL transformations. Created data lakes and optimized data warehouses using AWS S3, Redshift, Athena, and Quicksight, enhancing data access and visualization for stakeholders. Led data migration from on-premises databases to RDS and Redshift, ensuring minimal downtime and data integrity throughout the transition. Conducted data quality checks within pipelines using Great Expectations and custom Python scripts, automating validations on S3 and Redshift staging areas. Designed and maintained dimensional data models (star schemas, snowflake schemas) in Redshift to support efficient querying and reporting. Applied data encryption and masking techniques using AWS KMS to secure sensitive healthcare data, ensuring compliance with GDPR, HIPAA, and other regulatory frameworks. Utilized Lambda to trigger data pipelines based on S3 events, ensuring real-time processing of incoming files and dynamic ETL execution. Collaborated with cross-functional teams to gather requirements and deliver data solutions aligned with business needs, leveraging Quicksight dashboards for reporting and monitoring.
Mentored junior engineers on DBT, Snowflake, and Redshift best practices while reviewing code in Git to ensure high code quality.
Created and maintained technical documentation for ETL processes, data models, and tools, facilitating knowledge transfer and onboarding of new team members. Additionally, managed data governance policies to ensure data privacy, security, and compliance across AWS services, working closely with legal and compliance teams.Environment: AWS (EMR, GLUE, S3, Redshift, Athena, Kinesis, Lambda, Quicksight), Spark, Pyspark, Python, PostgreSQL, Snowflake, Kafka, Airflow, Tableau and GitHubClient: Barclays Solutions, Whippany, NJ May 2021-October 2022Data Engineer AWS Cloud Architecture & Migration: Architected and maintained scalable, fault-tolerant AWS environments using Terraform and CloudFormation across multiple availability zones, ensuring high availability and performance. Led the migration of legacy systems to AWS, ensuring minimal downtime and data integrity. ETL Pipeline Development & Optimization: Developed and optimized ETL pipelines using Talend, Spark, AWS Kinesis, and Kafka for efficient data ingestion, transformation, and migration across cloud (AWS, GCP, Azure) and on-premises environments. Real-Time Data Processing: Engineered real-time data streaming solutions using Kinesis, Kafka, Spark Streaming, Pub/Sub, and Flink to process and analyze live data, enabling real-time insights and faster decision-making. Data Storage & Processing: Implemented data storage solutions on AWS and GCP (S3, Redshift, Google Cloud Storage) and utilized Dataproc, BigQuery, and AWS Glue for data processing and analysis. Designed and deployed BI solutions on Azure using services like Azure Data Lake, Data Factory, and Stream Analytics. Hadoop & Big Data Management: Managed and monitored Hadoop clusters using Cloudera, developing Spark applications for data extraction, transformation, and analysis. Proficient in Hadoop, Hive, Sqoop, and HBase, optimizing performance through partitioning and indexing techniques. Scripting & Automation: Created custom Python and Shell scripts for ETL tasks, leveraging AWS Lambda for event-driven processing. Automated EMR cluster launches and Hadoop configurations using Python and Boto3. Workflow Automation: Developed complex data workflows using Apache NiFi, Oozie, and Airflow to automate data processing tasks, improving operational efficiency and reducing manual intervention. CI/CD & Infrastructure Management: Designed and implemented CI/CD pipelines using Jenkins and Ansible for automated build and deployment. Hands-on experience with containerization (Docker) and infrastructure management using Terraform and AWS CloudFormation. Microservices & Serverless Architecture: Designed and deployed microservices architecture using AWS Lambda and ECS, enabling scalable serverless applications with efficient resource management. Collaboration & Data Product Development: Collaborated with data architects and engineers to define and implement backend requirements for data products, including aggregations, materialized views, and data tables for visualization. Data Analysis & Insights: Experienced in data and statistical analysis using SAS, Python, and SQL to generate reports, analyze trends, and derive actionable insights from complex datasets.Environment: Hadoop, Cloudera, SQL, Terraform, Splunk, RDBMS, Jira, Confluence, Shell Scripting, Zookeeper, AWS, Oracle, Git, Kafka, GraphQL, CI/CD, Jenkins, Agile, Azure Databricks, Tableau.Client: DTCC-Dallas, TX November 2019-May 2021
Big data/Scala DeveloperResponsibilities: Worked with Sqoop to efficiently import and export data between HDFS and relational databases or mainframe systems, ensuring seamless data transfers across platforms. Implemented and optimized a Hadoop cluster with Cloudera CDH4, enabling efficient data integration for large-scale software systems. Optimized data retrieval from big data sources using GraphQL queries within Scala applications, reducing API latency and improving data flow performance. Developed automated workflows in Apache Oozie for loading and preprocessing data with Hive, minimizing manual effort. Converted complex Hive/SQL queries into Spark transformations using RDDs and Scala, improving data processing speed and flexibility. Migrated the platform from Cloudera to AWS EMR, ensuring smooth transition with minimal disruption and performance improvements. Designed and built analytical components using Spark, Scala, and Spark Streaming to process real-time data streams. Created and managed Hive tables, loaded data, and authored queries that triggered backend MapReduce jobs, supporting efficient data analysis. Developed a web application using the HBase and Hive APIs to compare schemas between HBase and Hive tables, enhancing data consistency and management. Wrote Spark scripts to transfer and append data between temporary HBase tables and target tables, maintaining data integrity during transformations. Built complex, multi-step data pipelines using Spark, enabling the efficient processing of large datasets for downstream analytics. Ingested massive amounts of data into HDFS and Cassandra using Kafka, ensuring high throughput and real-time data availability. Monitored YARN applications to track resource allocation and troubleshoot cluster-related issues, ensuring system stability and performance. Aggregated and staged large amounts of log data with Apache Flume for subsequent processing and analysis within HDFS. Developed ETL workflows using Pig by writing Pig Latin queries that ran as MapReduce jobs, streamlining data preparation. Automated jobs and data processing tasks using Unix/Linux Shell scripting, scheduling workflows, and integrating with Hive and Pig processes. Leveraged Zookeeper to coordinate distributed services within the cluster, ensuring consistency and failover management.Environment: Hadoop, HDFS, Hive, Core Java, Sqoop, Spark, Scala, Hive, Cloudera CDH4, Oracle, Kerberos, SFTP, Impala, Jira, Wiki, Alteryx, Teradata, Shell/Perl Scripting, Kafka, AWS EC2, S3, EMR, Cloudera, Agile
Client: Edward Jones, St Louis, Missouri February 2018-October 2019Big data & Hadoop Developer Responsible for system analysis, design, development, testing and deployment. Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data tools. Developed and maintained production level ETL pipelines using Python, Airflow and SQL. Worked on Snowflake database on queries and writing Stored Procedures for normalization. Worked with Snowflake's stored procedures, used procedures with corresponding DDL statements, used JavaScript API to easily wrap and execute numerous SQL queries and CTEs. Installed and configured Apache Airflow for S3 bucket and Snowflake data warehouse and created data to automate over Airflow/Composer. Designed and developed ETL jobs to extract data from different sources and load it in data mart in Snowflake and managed Snowflake clusters such as launching the cluster by specifying the nodes and performing the data analysis queries. Implemented Hadoop-based data processing pipelines for large-scale data analysis and distributed storage.
Configured and managed Hadoop clusters to ensure optimal performance and resource utilization. Utilized Hadoop's HDFS (Hadoop Distributed File System) for storing and managing vast volumes of data efficiently.
Integrated Hadoop ecosystem components like Hive, Pig, and HBase to enable data processing. Experience with Unit testing/ Test driven Development (TDD), Load Testing. Experienced in consuming web services (REST) with Python programming language. Used REST Web services to integrate agile tool and XML release for SDLC in the project. Worked on JSON and XML data of web application assets and their attributes. Collaborated with frontend developers using Angular to integrate Java-based backend services, ensuring seamless data exchange and user interface functionality. Participated in requirement gathering and worked closely with the architect in designing and modelling. Rewrote existing Java application in Python module to deliver certain format of data. Used Ansible to configure and manage the infrastructure. Worked on Jenkins CloudBees for CI/CD in production environment. Automation (Ansible) and deploying on AWS environment (EC2). Created Amazon EC2 Cloud Instances using Amazon Web Services and configuration of launched instances with respect to specific applications.Environment: Python, HDFS, REST Web Services, SDLC, Docker, NumPy, Pandas, JSON, Linux, Shell Scripting, TDD, Ansible, AWS, CI/CD, Snowflake, SQL Server 2000/2005, SSRS.
Client: HighRadius, Hyderabad, India June 2012-December2015Java Developer
Responsibilities: Excellent JAVA, J2EE application development skills with strong experience in Object Oriented Analysis, extensively involved throughout Software Development Life Cycle (SDLC) Implemented various J2EE standards and MVC framework involving the usage of Struts, JSP, AJAX and servlets for UI design.
Used SOAP/ REST for the data exchange between the backend and user interface.
Utilized Java and MySQL from day to day to debug and fix issues with client processes. Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs. Developed authentication through LDAP by JNDI. Developed and debugged the application using Eclipse IDE. Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up. Involved in backing up database & in creating dump files. And creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability matrix. Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.
Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, SQL, Struts, JSP, SOAP, NodeJS, Spring boot, Hibernate, Junit, Oracle, XML, Putty and Eclipse, Waterfall.EDUCATION:Master of Science (M.S) in Computer Science, Governors State University Jan 2016 Dec 2017 |