| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Email: EMAIL AVAILABLE Overland Park, Kansas City Ph: PHONE NUMBER AVAILABLEHighly dynamic tech professional with a background in building data warehouse technical architectures, infrastructure components, ETL processes, Designing, Developing and maintaining CI/CD pipelines, and data visualization, reporting, and analytics tools. 5 years of working experience with modern data platforms, APIs, SQL, DAX languages & Power Query, developing enterprise data pipelines, lakes, and warehouses. Proven communication skills to engage effectively with customers and all relevant business stakeholders.EDUCATIONMaster s in Computer Science. Aug 2022 - Dec 2023University Of Central Missouri, Warrensburg, Missouri, USA Grade 3.1 / 4.0Bachelor of Technology in Computer Science and Engineering Aug 2018 Jul 2022Tirumala Engineering College, Jonnalagadda, Narasaraopet, India Grade 7.1 / 10.0TECHNICAL SKILLS
Programming Languages: Python(NumPy, Pandas, Matplotlib), Java, SQL Big Data Technologies: Hadoop, Spark, Hive, Pig, Kafka, Spark ,Flume, OLTP, OLAP, Map Reduce Database: Oracle, MySQL, PostgreSQL, MongoDB, Cassandra, Teradata, NoSQL
ETL/ ETL Tools: Informatica, SSIS, SSRS, Talend. Cloud: Azure, Azure Databricks, Azure Data Factory, Azure Cloud, Azure SQL Server, Azure Data Lake, Azure Synapese Analytics, Azure DevOps, Azure Storage, AWS S3,Iceberg, Sage maker, EMR, Redshift, Glue, Athena, Step Functions, RDS, CloudWatch, DynamoDB, kinesis, Lambda, SQS, SNS, ECS Fargate, AppFlow,. Data Warehouse: Power BI, Tableau. Devops: CI/CD, Jenkins, Docker, Kubernates, SDLC, Terraform
Tools/Frameworks/Utilities: MySQL, TSQL, Jira, Confluence, Power BI, Tableau, Jupiter Notebook, PySpark, NumPy, Pandas, GitHub, Bitbucket, PyCharm, SDLC, Hadoop, MapReduce, Spark, Snowflake, Airflow, Informatica, Teradata, Hadoop, HBase, Jenkins. Operating Systems: Linux, Unix, Windows Web technologies: HTML, CSS, JavaScript, .NET.PROFESSIONAL EXPERIENCEFidelity Investments, Florida (Role: Data Engineer) Aug 2023 Till Date Developed CI/CD pipelines & custom tools to manage 100+ compute jobs with an average turn-around time 10% faster than the industry average. Authored complex Unix and bash scripts to automate data reconciliation & validation processes, improving efficiency by 35%. Designed and implemented ETL/ELT processes to extract, transform, and load data from various sources into Azure Data Lake Storage and Azure Synapse Analytics. Received commendations from project managers and stakeholders for clear and concise communication, facilitating smooth project execution and client satisfaction. Developed and deployed production-ready data pipelines using Python and PySpark, resulting in data processing and enhanced efficiency. Designed and implemented relational data models for complex business requirements, leading to improved data integrity and efficient query execution. Successfully deployed and managed PaaS services on Azure, including Azure Data Factory and Azure Databricks, reducing infrastructure costs by 20% while improving scalability and performance. Developed tailored data solutions for clients including predictive analytics models for patient readmission rates, resulting in a 15% reduction in readmission rates and cost savings. Designed and implemented a data lake solution using Hadoop ecosystem components, enabling centralized data storage and analytics for business users across the organization. Implemented agile development methodologies using Microsoft VSTS (Azure DevOps), leading to improved collaboration among team members and faster delivery of software releases. Configured seven ADF pipelines to automate & monitor data movement from 15 disparate sources into Azure Synapse Analytics, yielding an 85% reduction in manual ETL. Implemented database optimization techniques in Oracle, including index tuning and query optimization, resulting in a 40% reduction in query execution time and improved overall system performance. Successfully integrated Scala applications with various Azure services such as Azure Blob Storage, Azure SQL Database, and Azure Event Hubs, enabling seamless data processing, storage, and communication. Spearhead the implementation of Azure Open AI services and frameworks, including Azure Cognitive Services, Azure Machine Learning, and Azure Bot Services. Developed and optimized Hive queries and data processing workflows, enabling ad-hoc querying and analysis of structured and semi-structured data stored in Hadoop clusters, leading to actionable insights and informed decision-making. Implemented real-time data processing solutions using Kafka and Apache Flink, enabling the ingestion, processing, and analysis of high-volume streaming data streams for actionable insights. Configured GitLab integration with Azure services such as Azure Repos, Azure Pipelines, Azure Artifacts, and Azure Kubernetes Service (AKS) to streamline development workflows and enhance team productivity. Configured and deployed Flume agents to collect, aggregate, and transport log data and event streams from distributed sources to Hadoop for centralized storage and analysis, facilitating real-time monitoring and analysis of system logs and operational data. Designed and implemented robust data pipelines using Apache Cassandra, ensuring efficient and fault-tolerant data ingestion, storage, and retrieval for high-volume, distributed data sets, resulting in improved data availability and reliability.Vision Solar, New Jersey (Role: Data Engineer) Dec 2022 Jul 2023 Utilized AWS services Athena, Redshift, and Glue ETL jobs to build AWS Glue catalog tables, enabling seamless data processing and integration. Identified and resolved performance bottlenecks in data processing workflows, resulting in a 30% reduction in processing time and improved system reliability. Architected and deployed distributed computing applications using Apache Spark on AWS EMR, processing terabytes of data daily with high availability and fault tolerance. Skilled in utilizing Delta Lake on AWS/S3 for consistent and reliable data in data lake scenarios, accessible via Spark or SQL. Involved in loading data from AWS S3 to Snowflake and processed data for further analysis. Created interactive dashboards in Tableau to visualize sales trends and customer behavior, providing actionable insights to sales and marketing teams and improving decision-making processes. Optimized SQL queries and database indexes in PostgreSQL, resulting in a 40% improvement in query performance and reduced database latency for critical business applications. Developed and deployed interactive dashboards in QlikSense and QlikView, providing real-time insights into key performance indicators (KPIs) and facilitating data-driven decision-making for business stakeholders. Experience designing and implementing data pipelines using Airflow, including integrating with AWS services such as S3, AWS EMR, and Redshift to orchestrate ETL processes and perform data transformations. Developed custom reports and dashboards in Yardi to provide actionable insights for decision-making processes. Successfully integrated Java applications with various AWS services such as S3, DynamoDB, and SQS, enhancing functionality and enabling seamless data processing and communication. Built predictive models in R to forecast customer churn, leading to a 20% reduction in customer attrition and increased customer retention rates. Designed and implemented data manipulation and transformation workflows in SAS, streamlining data processing tasks and reducing manual effort by 30%. Designed and implemented real-time data storage and retrieval systems using HDFS, HBase, leveraging its NoSQL capabilities to handle high-velocity data streams and support low-latency data access for critical business applications. Designed and developed microservices architecture using Scala and deployed them on AWS ECS or EKS, enabling scalability, fault tolerance, and flexibility. Implemented data ingestion pipelines using Sqoop to efficiently transfer data between Hadoop and relational databases (RDBMS), automating the process of importing and exporting data, and reducing manual effort and data transfer latency. Successfully deployed and managed various PaaS services on AWS, including Amazon EMR, Amazon Redshift, AWS Glue, and AWS Lambda, optimizing infrastructure costs and enhancing scalability and reliability for data processing and analytics workloads.Amdocs, Pune (Role: Big Data Engineer) Sep 2019 Jun 2022 Optimizing SQL queries for efficiency, reducing query response times by an average of 50% across all databases. Experienced with multiple file formats such as ORC, Avro, XML, Parquet, and JSON. Proficient in Linux system administration, including installation, configuration, troubleshooting, and maintenance of various Linux distributions such as Ubuntu, CentOS, and Red Hat. Used GitHub for source code version control. Practical understanding of data modeling concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables. Proficient in using and designing solutions on Azure Cloud infrastructure, particularly Azure Data Factory and Azure DevOps. Leveraging Terraform to define, manage, and provision Azure infrastructure resources for data engineering projects. Developing highly complex Python and Scala code for data processing and analytics using inbuilt libraries. Utilized Azure ML Studio, Python SDK to preprocess data, train models, and evaluate performance metrics for regression, classification, and clustering tasks. Expertise in SQL and database design, azure sql and cosmos consistently delivering efficient queries on large datasets for improved performance and usability. Designing and implementing a highly scalable data pipeline architecture that increased processing efficiency by 50% and reduced data ingestion time by 30%. Experience in working with NoSQL databases like HBase, MongoDB, and Cassandra. Designed and implemented complex ETL workflows using Matillion, reducing data integration time by 50%. Developed Apache Flink streaming applications to perform complex event processing (CEP) and real-time analytics on streaming data sources, enabling proactive decision-making and trend analysis. Integrated KQL queries seamlessly with Azure services such as Azure Monitor, Azure Log Analytics, and Azure Application Insights, enabling comprehensive monitoring, logging, and analysis of cloud resources and applications. Implemented DBT to streamline data transformation processes, resulting in a 30% reduction in data processing time. Constructed Azure Data Lake Storage instances to curate & catalog 1TB of sensitive transactional data for downstream analytics use, improving system performance. Implemented Apache Kafka security protocols (SSL, SASL) to ensure data privacy and compliance with regulatory standards.TECHNICAL PROJECTS
Hotel Management system This project aims to manage the details of hotel rooms, services, payments, bookings. (Python, SQL)Banking System This project is a banking management software which could maintain data & provide a user-friendly interface for retrieving customer related details just in few seconds, with 100%accuracy.
|