| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Boston, MA | PHONE NUMBER AVAILABLE | EMAIL AVAILABLE | LinkedIn | GitHub
Seeking Full-Time Data Engineering Opportunities starting June Street Address
EDUCATION
Northeastern University Boston, Massachusetts
Master of Science in Information Systems May Street Address
Vivekanand Education Society s Institute of Technology Mumbai, India
Bachelor of Engineering in Electronics Engineering May 2020
EVIDENCE OF EXCELLENCE
Created dynamic dashboards using Power BI and Tableau, facilitating executive-level decisions
Managed and optimized data storage across MySQL, Spark, and AWS for large-scale projects
Developed over 5 complex ETL/ELT pipelines using Python and Airflow, optimizing data workflows
Implemented 5+ machine learning models using TensorFlow and Keras, contributing to predictive analytics solutions
Led training sessions on Python, Power BI, Tableau, ELK Stack, and SQL for new hires
EXPERIENCE
National Stock Exchange Information Technology Mumbai, India
Data Analyst March 2021 December 2021
Developed and delivered over 50 BI Publisher reports and 5 integrations on financials modules, streamlining financial reporting
and enhancing data-driven decision-making
Optimized BI Publisher code performance by 50% through query optimization, redundant table call elimination, and pre-
execution WITH clause implementation, significantly improving data analysis efficiency
Automated the extraction and visualization of over 200GB of monthly examination data using SQL and Power BI, reducing the
time to insights by 50% and enabling stakeholders to make better decisions
Designed, implemented, and managed Airflow orchestration for top clients, enabling real-time monitoring of system events in
critical internal applications by performing log analysis
Developed and deployed a machine learning pipeline to process 20 TBs of daily stock data using Apache Spark to identify trader
outliers, achieving 90% better accuracy than prior methods
Developed a data pipeline to detect instances of cheating among students using Python and Tableau, automating data analysis and
reporting, saving over 10 hours of manual work each week and enhancing academic integrity
Conducted training session for new hires on Python, Power BI, Tableau, ELK Stack and SQL, enabling new employees to
contribute effectively to data-driven initiatives
PROJECTS
Iowa Liquor Sales
Extracted data from Google BigQuery and Iowa government website, creating a robust 26 million record dataset for analysis
Performed comprehensive data profiling using Alteryx to identify and address anomalies, ensuring data integrity and accuracy
Built a dimension model containing 8-dimension, 4-fact table in Navicat for enhanced query performance and data organization
Developed a robust data pipeline using Alteryx and designed visually compelling interactive dashboards and reports in Power BI
to effectively communicate insights derived from the analysis
Traffic Sign Recognition
Executed traffic sign recognition project using SVM, Random Forest, CNNs, and Scikit-Learn's MLP Classifier
Conducted Exploratory Data Analysis (EDA) on the German Traffic Sign Recognition Benchmark dataset, gaining insights into
data distribution, class imbalances, and visualizing feature relationships
Analyzed custom CNNs vs. VGG-16 transfer learning, evaluating accuracy, loss, and applying regularization for optimization
Attained outstanding CNN model results using SGD and Adam optimizers: 89.31% and 95.63% accuracies. Top Random Forest:
79.66% accuracy; SVM: 83.00% accuracy
Heart Failure Prediction and Analytics
Led team in employing 9 ML algorithms for precise heart condition prediction, enhancing healthcare and diagnoses
Applied rigorous hyper-parameter tuning to promote generalization, mitigate overfitting, and bolster model accuracy
Leveraged ensemble learning's Stacked algorithm to create a potent predictive model
Random Forest emerged as the leading performer, showcasing exceptional result of 85.81%, while the Stacking Classifier closely
trailed to 84.42%, reinforcing the efficacy of the ensemble approach
SKILLS
Programming Languages: Python, Advanced SQL, Java, HTML, C, C++
Data Engineering and Data Visualization: Talend, Alteryx, Airflow, Informatics, Power BI, Tableau, Elasticsearch, Apache Spark
Data Science and Machine Learning: NumPy, Pandas, Matplotlib, scikit-learn, TensorFlow, OpenCV, Pytorch, Seaborn, Keras
Cloud Services: AWS S3, GCP, Azure Data Factory, Databricks, Lambda
|