| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Data Engineer/AnalystEmail: EMAIL AVAILABLE Contact:PHONE NUMBER AVAILABLE LinkedIn New Jersy, Street Address
PROFESSIONAL SUMMARY:Around 5 Years of IT experience, specializing in data engineering, with a proven track record in designing and implementing end-to-end data pipelines.Managed data ingestion, transformation, and integration across various data sources using PySpark, SQL, and cloud services, resulting in optimized data workflows and improved processing times by 30%.As a Cloud data architect and engineer, specialized in AWS and Azure frameworks, Cloudera, Hadoop and Spark Ecosystem, PySpark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, Data DevOps Frameworks/Pipelines Programming skills in Python.Achieved a 50% reduction in data processing time using performance tuning and Spark Data Frame transformations.Implemented real-time analytics solutions using Spark and Kafka reducing data processing latency by 40%.Applied Kimball methodology in designing and optimizing dimensional data models to support business analytics and reporting needs.Established data management practices by implementing data controls and lineage tracking using AWS Glue Data Catalog and Lake Formation ensuring data accuracy and transparency across all stages of the data pipeline.Skilled in BI reporting tools (Tableau, Microsoft Excel, Power BI) and complex SQL query construction.Strong grasp of Hadoop YARN architecture and tools; experience with Agile methodologies (Scrum and Kanban).TECHNOLOGY STACK:Languages : Python, SQL, ScalaDatabases : MySQL, HBase, MongoDB, Oracle, Dynamo DB.Cloud Services : AWS (S3, EC2, EMR, Redshift), Data Bricks, Blob Storage, PurviewAzure Functions, Virtual Machines, Data Factory, Synapse AnalyticsTools : Git, VS Code, Jupyter Notebook, PyCharmFrameworks : Pandas, NumPy, Matplotlib, Seaborn, TensorFlow, PyTorch, Scikit-learnData Visualization : Tableau, Excel, Power BIOperating Systems : Windows, LinuxAnalytical Skills : Data Wrangling, Preprocessing, Profiling, Mining, AnalysisBig Data Technologies : HDFS, Map Reduce, Sqoop, Hive, HBase, Spark, YARNCERTIFICATIONS: Microsoft certified Azure Data Engineer Associate (DP-203).PROFESSIONAL EXPERIENCE:Universal Delivery Solutions Jan 2024 - CurrentData EngineerDesigned and optimized data models, including fact tables for shipments and deliveries and dimension tables for customers and products, enhancing reporting and analytics capabilities.Extracted files from Mongo DB through Sqoop, placed in HDFS, and processed.Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.Reduced query execution times by 50% and improved Instantaneous data access through advanced SQL techniques and complex joins on datasets exceeding 10 million records.Leveraged AWS EMR and Hadoop ecosystem tools like Hive and Spark to process and analyze over 10 TB of package tracking data, improving delivery time prediction accuracy by 30%.Developed real-time data solutions using HBase and AWS Lambda to monitor live package tracking data, enabling quick response to delivery deviations and reducing delays by 15%.Developed modular, scalable data models using dbt within Snowflake to streamline transformation workflows, enhancing data pipeline efficiency by 30% and optimizing resource utilization across the platform.Migrated 50+ Excel reports to Power BI and created advanced visualizations and custom charts, improving report clarity and user satisfaction by 30%.Developed and deployed advanced predictive analysis models using Python, identifying trends and patterns in data and providing valuable insights that improved business decision-making by 35%.Collaborated with cross-functional teams to gather requirements and assist in translating them into technical solutions, contributing to projects that improved business operations by 20%.Environment: Python, SQL, Hive, Sqoop, Scala, Spark, Data Bricks, PowerBI, Google Maps API, Azure SQL DB, Amazon S3, Azure Data Factory v2, Azure HDInsight, Azure Data Lake Storage (ADLS), Snowflake, DBT.Clark University August 2023 - May 2024Graduate Teaching Assistant Data and Python ProgrammingCollaborated with faculty to develop and update course syllabus and curriculum focusing on practical and theoretical aspects of Azure and Python Programming.Created and maintained engaging instructional materials, including lecture slides, tutorials, and lab exercises tailored for a diverse student body.Assisted in the integration of cutting-edge Azure services (like Azure SQL Database, Azure Functions, etc.) into the curriculum to ensure course relevance with industry practices.Implemented Azure Storage solutions, including Blob and Table Storage, to teach students about data storage options in the cloud.Configured Infrastructure as Code (IaC) using Terraform and ARM templates to automate the provisioning and management of Azure environments, enhancing reproducibility and compliance.Environment: Python, Pandas, NumPy, Matplotlib, Seaborn, Terraform, ARM Templates, Azure SQL Database, Azure Cosmos DB, Azure Data Bricks, Azure Virtual Machine, Azure DevOpsByjus Learning, India July2018 - August 2022Data Engineer/AnalystAnalyzed sales data using SQL and Excel, identifying key trends and patterns that led to a 15% increase in quarterly revenue through targeted marketing campaigns.Developed interactive dashboards in Tableau to visualize customer demographics and purchasing behaviors, reducing reporting time by 30% and improving data accessibility for stakeholders.Collaborated with senior engineers to build data pipelines using Hadoop components such as Sqoop, Hive and Spark leading to a 20% increase in data processing speed.Assisted in cloud migration projects to Amazon EMR and Databricks, helping to optimize resource allocation and reduce cloud infrastructure costs by 10%.Demonstrated expertise in data cleaning and transformation using AWS EMR with PySpark scripts, worked extensively with Amazon Redshift Data Warehouse, RDS, and DynamoDB, and utilized SQL queries and stored procedures in Amazon RDS for data extraction and integration, including scheduling EMR jobs for event-based data extraction.Using Informatica PowerCenter Designer analyzed the source data to Extract & Transform from DB2 by incorporating business rules using different objects and functions that the tool supports.Utilized NumPy and Pandas for basic data analysis tasks, supporting the development of machine learning models and improving data exploration efficiency by 20%.Implemented CI/CD pipelines using Jenkins to automate the deployment and management of Hadoop data processing jobs and infrastructure updates, achieving a 30% reduction in deployment times.Environment: Python, SQL, Excel, Pandas, NumPy, Matplotlib, Seaborn, Informatica, Amazon EMR, S3, EC2, Lambda, RDS, Databricks, HDFS, Map Reduce, Sqoop, Hive, HBase, Spark, Tableau, Jenkins.EDUCATION:Clark University, Worcester, Ma, USA 2024Master of Science in data AnalyticsMVSR Engineering College, Hyderabad, India 2019BE in Electrical and Electronics Engineering |