| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateName: Ravikanth ReddyEmail: EMAIL AVAILABLEContact Number : PHONE NUMBER AVAILABLEProfessional Summary:Over 4+ years of professional IT experience in Data Engineer with Data Analysis, Design, coding and Development of Data Warehousing implementations across Retail, Financial and Banking Industries.Skilled in managing data analytics, data processing, machine learning, artificial intelligence, and data-driven projects.Proficient in handling and ingesting terabytes of Streaming data (Kafka, Spark streaming, Strom), Batch Data, Automation.Good at Manage hosting plans for Azure Infrastructure, implementing & deploying workloads on Azure virtual machines (VMs).Successfully implemented, set up, and worked on various Hadoop Distributions (Cloudera, Hortonworks, AWS EMR, Azure HDInsights, GCP Data Proc).Skilled in data ingestion, extraction, and transformation using ETL processes with AWS Glue, Lambda, AWS EMR, and Azure Data Bricks.Proficiency in designing scalable and efficient data architectures on Azure, leveraging services like Azure Data Lake, Azure Data Factory, Azure Data Bricks, Azure SynapseDB, and PowerBi.Experience in developing scripts using Python, Shell Scripting to do Extract, Load and Transform data working knowledge of AWS Redshift.Excellent knowledge on Distributed components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.Expertise with the tools in Hadoop Ecosystem including Spark, Hive, HDFS, MapReduce, Sqoop, Kafka, Yarn, Oozie, and HBase.Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.Experience working with Snowflake Multi cluster and virtual warehouses in Snowflake.Extensive experience involving the migration of on-premises data into the cloud, along with the implementation of CI/CD pipelines like Jenkins, Code Pipeline, Azure DevOps, Kubernetes, Docker and GitHub.Experience automating data engineering pipelines utilizing proper standards and best practices (right partitioning, right file formats, incremental loads by maintaining previous state etc.,)Developed custom SSIS components and scripts to enhance data transformation processes, improving overall ETL performance by 30%.Experience in designing and developing production ready data processing applications in Spark using Scala/Python.Strong experience creating the most efficient Spark applications for performing various kinds of data transformations like data cleansing, de-normalization, various kinds of joins, data aggregation.Experience fine-tuning Spark applications utilizing various concepts like Broadcasting, increasing shuffle parallelism, caching/persisting Data Frames, sizing executors appropriately to utilize the available resources in the cluster effectively etc.,Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.Good Knowledge in productionizing Machine Learning pipelines (Featurization, Learning, Scoring, Evaluation) primarily using Spark ML libraries.Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.Experienced in working with Microsoft Azure using Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Stream Analytics, Azure Databricks, Azure Blob Storage, Azure Purview, Azure Data Flow etc.,Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers, and strong experience in writing complex queries for Oracle.Experience in support activities like troubleshooting, performance monitoring and resolving production incidents.Strong experience in Object-Oriented Design, Analysis, Development, Testing, and Maintenance.Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.Ability to work closely with teams, to ensure high quality and timely delivery of build and release.Excellent communication skills, with an ability to understand the concepts and technical and non-technical requirements.Technical Skills:Big Data SystemsAmazon Web Services (AWS), Azure, Google Cloud Platform (GCP), Cloudera Hadoop, Hortonworks Hadoop, Apache Spark, Spark Streaming, Apache Kafka, Hive, Amazon S3, AWS KinesisDatabasesCassandra, HBase, DynamoDB, MongoDB, BigQuery, SQL, Hive, MySQL, Oracle, PL/SQL, RDBMS, AWS Redshift, Amazon RDS, Teradata, SnowflakeProgramming & ScriptingPython, Scala, PySpark, SQL, Java, BashETL Data PipelinesApache Airflow, Sqoop, Flume, Apache Kafka, DBT, Pentaho, SSISVisualizationTableau, Power BI, Quick Sight, Looker, KibanaCluster SecurityKerberos, Ranger, IAM, VPCCloud PlatformsAWS, GCP, AzureScheduler ToolsApache Airflow, Azure Data Factory, AWS Glue, Step functionsSpark FrameworkSpark API, Spark Streaming, Spark Structured Streaming, Spark SQLCI/CD ToolsJenkins, GitHub, GitLabOperating SystemsWindows, Linux, Unix, Mac OS XWork ExperienceHSBC Bank, Atlanta, GA (Remote) . Duration: Mar 2023 Till DateData EngineerExploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn.Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS and converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.Wrote Spark applications for Data validation, cleansing, transformations, and custom aggregations and imported data from different sources into Spark RDD for processing and developed custom aggregate functions using Spark SQL and performed interactive querying.Designed and implemented complex ETL workflows using SSIS to extract, transform, and load data from diverse sources into data warehouses.Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persistence the data in HDFS.Worked on Oracle Databases, RedShift & Snowflakes.Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements and involved in managing S3 data layers and databases including Redshift and Postgres.Worked on creating star schema for drilling data. Created PySpark procedures, functions, packages to load data.AWS S3 buckets, performed folder management in each bucket, managed logs, and objects within each bucket.Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.Implemented logging and error handling in SSIS packages, enhancing the monitoring and troubleshooting capabilities of ETL processes.Developed a Python Script to load the CSV files into the S3 buckets and created.Involved in designing and developing Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon SWF, Amazon SQS, and other services of the AWS infrastructure.Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes.Conducted performance tuning of SSIS packages by analyzing execution plans and optimizing queries, resulting in faster data loads and improved resource utilization.Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake.Utilized SSIS data flow tasks to efficiently manage data transformation, incorporating advanced techniques like data profiling and data lineage tracking.Extensively used Stash Git-Bucket for Code Control and Worked on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena and Snowflake.Environment: Spark, AWS, EC2, EMR, AWS S3, Glue, SSIS, SQL Workbench, Tableau, Sqoop, Spark Streaming, Scala, Python, Hadoop (Cloudera Stack), Informatica, Jenkins, Docker, Hue, Spark, Netezza, Kafka, HBase, HDFS, Hive, Pig, Oracle, GIT, Grafana.Zoylo Digihealth,India. Duration: July 2021 July 2022Data EngineerResponsible for processes to improve reliability processes that increase efficiency, eliminate downtime, and maintain performance at scale at all platforms and environments.Responsible to Build the ETL Pipelines (Extract, Transform, Load) from data lake to different databases based on the requirements.Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing, and analytics using inbuilt libraries in Azure Databricks.Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.Install and Configure Apache Airflow for S3 bucket and snowflake data warehouse and created DAGs to run the Airflow.Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and pre-processing on Hortonworks.Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.Integrated SSIS with Azure services to enable real-time data processing and analytics, supporting business intelligence initiativesUsed Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.Built and managed data pipelines using Azure Data Factory and Azure Data Bricks ensuring efficient and reliable data processing and analysis workflows.Involved in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.Created and maintained user accounts, roles on Jira, MySQL, production and staging servers.Involved in Data Architecture, Data profiling, Data analysis, data mapping and Data architecture artifacts design.Created data models and schema designs for Snowflake data warehouses to support complex analytical queries and reporting.Worked on migrating data from AWS Redshift to properly partitioned dataset on AWS S3.Developed multiple Kafka Producers and Consumers as per the software requirement specifications.Utilized Kafka pub-sub model for tacking real-time events in the data records to trigger processes for data orchestration.Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.Creating SSIS Packages by using different Control Flow Tasks like Data Flow Task, Execute SQL Task, Sequence Container, For Each Loop Container, Send Mail Task, Analysis Service Process task.Converted SQL queries into Spark transformations using Spark RDDs, Python, PySpark, and Scala.Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI and optimized the DAX queries.Automated advanced SQL queries and ETL techniques using Apache Airflow to reduce boring weekly administration tasks.Extracted data from source like SQL Server Databases, SQL Server Analysis Services Cube, Excel and loaded into the target MS SQL Server database.Developed and implemented Software Release Management strategies for various applications according to the agile process.Participated in daily stand-up meetings to update the project status with the internal Dev team.Environment: Kafka, Spark, AWS, Azure, Python, Scala, Airflow, ETL, SSIS, Redshift, Data Factory, Data Bricks, Jira, SQL, Snowflake, Power BI, Data Cleaning, Data Profiling, Data Mining and Windows.SEED Infotech, India. Duration: June 2019 June 2021Data EngineerResponsibilities:Handled data transformations based on the requirements.Created error reprocessing framework to handle errors during subsequent loads.Handled application log data by creating customer loggers.Designed and built a custom and genetic ETL framework Spark application using Scala.Configured spark jobs for weekly and monthly executions using Amazon data pipeline.Executed queries using Spark SQL for complex joins and data validation.Developed Complex transformations Mapplets using Informatica to Extract Transform and Load Data into Data marts Enterprise Data warehouse EDW and Operational data store ODS.Created SSIS package to get the dynamic source filename using For Each Loop Container.Used the Lookup, Merge, Data conversion, sort etc. Data flow transformations in SSIS.Built continuous ETL pipeline by using Kafka, Spark streaming and HDFS.Performed ETL on data from various file formats (JSON, Parquet and Database).Created independent components for AWS S3 connections and extracted data into Redshift.Involved in writing Scala scripts for extracting from Cassandra Operational Data Store tables for comparing with legacy system data.Worked on data ingestion file validation component for threshold levels, last modified and checksum.Implemented MDM practices to maintain accurate and consistent customer information, enforcing ETL rules to safeguard data integrity.Leveraged OLAP tools, including ETL, Data Warehousing, and Modelling, to extract, transform, and load data between SQL Server and Oracle databases, employing Informatica/SSIS for seamless data integration.Actively participated in meetings with user groups to analyze requirements and provide recommendations for design and specification enhancements, ensuring solutions aligned with user needs.Environment: Spark, Scala, AWS, S3, Cassandra, Redshift, Shell scripting, SSIS, Kafka, OLAP, Informatica, ETL.Education :Completed Bachelors in Computer Science and Engineering in 2019 from JNTUCompleted Masters in Computer and Information Science in 2024 from Saint Leo University |