| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Lead Data Scientist/ Generative AI Expert/ ML EngineerEmail: EMAIL AVAILABLE Phone: PHONE NUMBER AVAILABLESummary: 9 Years in Data Science & Machine Learning and overall, 11 Years in ITData Scientist and AI Engineer experienced in the entire end-to-end machine learning process, from data preparation to production, deploying and monitoring models for real-time and batch inference. Proficient in using analytical and statistical techniques, including machine learning, deep learning, generative AI, and large language models. Skilled in utilizing cloud services like AWS, GCP, and Azure. Known for creative problem-solving, such as developing a generative AI gift-suggestion tool and automating incident reporting with NLP. Passionate about pushing data's limits to drive innovation in every data-related challenge.Profile SummaryGenerative AI: Using the power of Large Language Models (LLMs), I push the boundaries of innovation in projects spanning Generative AI, MLOps, and AWS cloud services.Machine Learning: Proficient in machine learning techniques, I meticulously orchestrate data pipelines and enrich projects with advanced insights.MLOps: Orchestrated data pipelines and leveraged MLOps practices to optimize performance, ensuring seamless model deployment and monitoring.Cloud: Extensive experience in 3rd-party cloud resources: AWS, Google Cloud, and Azure. Proficient in a multitude of AWS cloud services, including EC2 for scalable compute, ECR for container management, ECS for orchestration, Lambda for serverless computing, Sage maker for machine learning, API Gateway for secure API management, and Glue for ETL workflows.Data Analysis: Adept in Python, SQL, Pandas, Matplotlib, Seaborn, and advanced data analysis tools, transforming raw data into valuable insights.NLP: Applied Natural Language Processing techniques, including tokenization, stemming, lemmatization, Word2Vec, Transformers, sentiment analysis, Name Entity Recognition, and Topic Modelling to extract actionable intelligence from text data.Deep Learning: Developed and trained state-of-the-art Artificial Neural Networks (ANNs), RNNs, LSTMs, Transformers and deep-learning models using Keras, TensorFlow and PyTorch.CI/CD Orchestrator: Established and maintained robust CI/CD workflows using Jenkins and GitHub Actions, enabling agile software development and deployment.Collaborative Leader: Led cross-functional teams with agility, conducted agile ceremonies, and meticulously documented project progress for transparent and efficient project delivery.Big Data: Working with and querying large data sets from big data stores using Hadoop Data Lakes, Data Warehouse, Amazon AWS, Snowflake, Redshift, Aurora and NoSQL.Innovation Catalyst: Continuously exploring emerging technologies and methodologies to drive innovation and stay at the forefront of data-driven solutions.Technical SkillsGenerative AI: LangChain, Prompt engineering, GPT 3, 3.5 Turbo, Open AI Davinci, Palm, LLMs, Stable Diffusion, Chatbot and GANsMachine Learning: Supervised Machine Learning Algorithms (Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees, Ensemble algorithm techniques(including Bagging, Boosting, and Stacking), Random Forest, XGBoost, Nave Bayes Classifiers, K Nearest Neighbors), Unsupervised Machine Learning Algorithms (PCA, K Means Clustering, Gaussian Mixtures, Hidden Markov Models, Auto Encoders), Imbalanced Learning (SMOTE, AdaSyn), Deep Learning, Artificial Neural Networks, Predictive analysis, Transfer LearningAnalytics: Data Analysis, Data Mining, Data Visualization, Statistical Analysis, Multivariate Analysis, Stochastic Optimization, Linear Regression, ANOVA, Hypothesis Testing, Chi-Square test, Forecasting, ARIMA, SARIMAX, Prophet, Sentiment Analysis, Predictive Analysis, Pattern Recognition, Classification, Behavioral ModelingNatural Language: Processing: Processing Document Tokenization, Token Embedding, Word Models, Word2Vec, FastText, BagOfWords, TF/IDF, Bert, GPT, Elmo, LDA, TransformersProgramming Languages: Python, R, SQL, JavaScript, CSS, Bootstrap, MATLAB, .Net, HTML, Mathematica, FlaskApplications: Machine Language Comprehension, Sentiment Analysis, Predictive Maintenance, Demand Forecasting, Fraud Detection, Client Segmentation, Marketing Analysis, Cloud Analytics in cloud-based platforms (AWS, MS Azure, Google Cloud Platform)Development: Version control tools like Jira, Git, GitHub, GitLab, Bitbucket, SVN, Mercurial, Trello, PyCharm, Visual Studio, Sublime, JIRA, TFS, Linux, Ubuntu, Tableau, Interactive DashboardBig Data & Cloud Tools: HDFS, SPARK, Google Cloud Platform (GCP), MS Azure Cloud, SQL, NoSQL, Data Warehouse, Data Lake, SWL, HiveQL, AWS (RedShift, Kinesis, EMR, EC2, Lambda, Code Build, Code deploy), SnowflakeDeployment: Continuous improvement in project processes, workflows, automation, and ongoing learning achievementProfessional ExperienceMLOps Engineer Since July 2023Southern Company, Atlanta, GAAs a Lead Data Scientist/ML Engineer at Southern Company, I improved search engine optimization and product recommendations by analyzing customer search queries and product descriptions with advanced NLP and deep learning techniques. I developed predictive models for customer churn and conversion, created customer personas for targeted marketing, and employed NLP for customer sentiment analysis. Deploying models on GCP and Cloud Elasticsearch, I achieved significant enhancements in customer satisfaction, sales, and SEO. I also conducted market segmentation and developed interactive dashboards for data-driven decision-making.Examined customer search queries and product descriptions using advanced NLP and deep learning methodologies such as keyword extraction with TF-IDF, embeddings with Word2Vec, topic modeling with LDA, and language translation with OpenNMT. This analysis improved search engine optimization and product recommendations, leading to greater visibility and increased sales.Analysed historical customer data to build a predictive model for customer churn, allowing for the implementation of proactive retention strategies.Developed detailed customer personas based on diverse attributes, enabling targeted marketing strategies and personalized customer experiences.Employed NLP techniques including sentiment analysis with BERT, text classification with ULMFiT, named entity recognition using Spacy, and text summarization via GPT-3 to assess customer satisfaction, enhance customer experiences, and drive sales growth.Created customer churn prediction models using Logistic Regression, Random Forest, and XGBoost to identify customers at risk of churning.Implemented a hybrid OCR system combining traditional rule-based OCR with deep learning-based OCR for improved accuracy and document handling flexibility.Created ensembles of CNN and RNN models for text recognition in images, significantly improving OCR system accuracy and robustness.Deployed OCR and NLP models using Vertex AI for model deployment, monitoring, and management in a production environment.Utilized Google Cloud Functions to trigger OCR and NLP models for real-time product recommendations.Set up Azure Log Analytics for monitoring and troubleshooting OCR and NLP models in production.Deployed models on GCP Vertex AI to ensure scalability and accessibility.Implemented models on Cloud Elasticsearch for enhanced search functionalities and on Cloud API for natural language processing tasks.Studied purchase patterns, including quantity, probability, and brand choices relative to pricing variations, to inform pricing strategy decisions.Utilized Deep Learning frameworks such as TensorFlow and Keras to predict customer conversion probabilities, optimizing marketing expenditures.Leveraged tools like TensorFlow, PyTorch, NLTK, and Spacy to apply NLP and deep learning techniques in e-commerce applications.Achieved notable improvements in customer satisfaction, sales, and SEO by deploying NLP and deep learning solutions on GCP, utilizing services like BigQuery, Cloud Storage, and Dataflow for scalability and accessibility.Developed SARIMAX models to forecast sales across multiple stores, sharing the forecasts via dashboards with store managers for better inventory management.Extracted insights from large datasets using data mining techniques with PySpark and developed ETL pipelines.Containerized models using Docker and deployed them on Kubernetes clusters.Applied feature engineering for dimensionality reduction and enhanced model performance.Conducted market segmentation through cluster analysis (Hierarchical Clustering and K-Means) and dimensionality reduction (PCA) to effectively segment the market.Created dynamic data visualizations and interactive dashboards using Tableau, Power BI, and Plotly.Collaborated with stakeholders by effectively communicating project progress, results, and presenting data-driven recommendations.Generative AI Specialist/ Lead Data Scientist May 2021 Jun 2023Oscar Health, New York, NYAs a Generative AI Specialist at Oscar Health, I automated Incident Report Narratives using Generative AI: The project aimed to enhance incident reporting within the technology company by leveraging Generative AI to automatically generate detailed and coherent narratives for internal incident reports. By utilizing Large Language Mod els, the tool enabled IT and operations teams to efficiently document and communicate technical issues to stakeholders.Utilized frameworks like LangChain and Llama Index for developing applications powered by language models to build chatbot to summarize, analyse, or generate responses for internal incident reports by interacting with APIs.Built another data analytics chatbot for analysing patient healthcare records and integrated RAG with multi agents.Scaling the chatbot to thousands of users using ECS clusters, AWS OpenSearch and BEDROCK modelsConducted in-depth interviews with IT and operations teams to identify specific pain points and requirements for incident reporting.Gathered detailed requirements for the tool, including desired features, data sources, and user workflows.Utilized the advanced capabilities of GPT-3 and Hugging Face Transformers for text generation and narrative construction.Applied NLP techniques for text generation and data extraction, ensuring accurate and contextually relevant output.Used HTML, CSS, and JavaScript to build a user-friendly interface that supports efficient data input and interaction.Leveraged Python and Flask to develop a scalable and efficient backend that integrates with the AI model.Implemented RESTful APIs to facilitate seamless communication between the front-end interface and the backend processing.Deployed the solution on AWS and Azure, ensuring scalability, reliability, and security.Processed and managed log files and technical data to ensure accurate and effective model training and inference.Maintained regular communication with stakeholders to incorporate their feedback and refine the tool's functionality.Performed a gap analysis to identify inefficiencies in the existing incident reporting process.Conceptualized the design of a tool that leverages generative AI to automate the transformation of technical data into human-readable narratives.Developed a business case to justify the need for the tool, highlighting potential time savings and improvements in communication.Curated a dataset of historical incident reports, including log files, timestamps, error codes, and resolutions.Engineered relevant features from raw data to enhance model training and improve narrative generation.Evaluated various LLMs, considering factors such as model size, training data, and performance metrics.Customized the training process to focus on the specific language and terminology used in incident reports.Conducted hyperparameter tuning to optimize the models performance for generating high-quality narratives.Created a responsive and intuitive web interface that facilitates easy input of technical data by IT and operations teams.Developed robust backend logic to handle data processing, model inference, and result generation.Implemented security measures to protect sensitive data and ensure secure communication between the web interface and the backend.Performed unit testing on individual components to ensure functionality and reliability.Conducted integration testing to validate the seamless operation of the entire tool, from data input to narrative generation.Engaged with IT experts to perform user acceptance testing, ensuring the tool met their expectations and requirements.Deployed the tool to a production environment, ensuring it was accessible and reliable for end-users.Monitored the tool's performance post-deployment, addressing any issues and optimizing as needed.Analyzed the impact of the tool on incident reporting efficiency, collecting feedback and usage metrics to demonstrate its value.Lead Data Scientist/ML Engineer Nov 2019 Apr 2021U.S. Bancorp, Minneapolis, MinnesotaAs a Lead Data Scientist at U.S. Bancorp, I built and optimized machine learning models using Scikit-Learn and AWS SageMaker, focusing on hyper-parameter tuning and efficient resource utilization. I designed and managed data workflows with Apache Airflow, automating complex pipelines, and stratified imbalanced data for fair representation in cross-validation. I developed anomaly detection models, including Artificial Neural Networks and unsupervised techniques, for fraud detection. Additionally, I created visualizations using Rs ggplot2 and developed interactive dashboards with Tableau for insightful data analysis.Conducting hyper-parameter tuning using Scikit-Learns model selection framework with Grid Search CV and Randomized Search CV algorithms.Designing, implementing, and managing data workflows using Apache Airflow, creating and maintaining DAGs to automate complex data pipelines.Stratifying imbalanced data to ensure fair representation of minority data in cross-validation sets.Building Artificial Neural Network models to detect anomalies and fraud in transaction data.Manipulating data and creating visualizations using Rs dplyr and ggplot2 for exploratory data analysis (EDA).Employing a heterogeneous stacked ensemble of methods for final decisions on fraudulent transactions.Consulting with regulatory and subject matter experts to understand data streams and variables.Cleaning and preprocessing data, scaling features, and performing feature engineering using Python libraries like pandas and NumPy. Building models with deep learning frameworks.Utilizing AWS SageMaker for end-to-end machine learning model development, from data preprocessing and feature engineering to model training, deployment, and monitoring.Implementing a Python-based distributed random forest.Extracting data from Hive databases on Hadoop using Spark through PySpark.Utilizing Scikit-Learn, SciPy, Matplotlib, and Plotly for EDA and data visualization.Developing unsupervised models like K-Means and Gaussian Mixture Models (GMM) from scratch in NumPy for anomaly detection.Performing data mining and developing statistical models in Python to provide tactical recommendations to business executives.Using predictive modeling tools in SAS, SPSS, R, and Python.Optimizing and fine-tuning machine learning models on SageMaker for efficient resource utilization and cost-effectiveness.Developing and deploying models using a Flask API stored in a Docker container.Evaluating model performance using metrics such as confusion matrix, accuracy, recall, precision, and F1 score, with particular attention to recall.Using Git for version control and collaboration on GitHub with team members.Designing dashboards with Tableau and providing complex reports, including summaries, charts, and graphs, to interpret findings for the team and stakeholders.Working with Data Engineers on database design for data science projects.Sr. Data Scientist Mar 2018 Oct 2019State Farm, Bloomington, IllinoisAs a Senior Data Scientist at State Farm, I conducted multivariate analysis to assess the impact of the WorkSafe Champions Safety Program on policies. Collaborating with the Special Investigations Unit (SIU), I evaluated the feasibility of detecting insurance fraud using machine learning, providing insights into potential fraudulent claims and activities.Operated within a Cloudera Hadoop environment, utilizing Python, SQL, and Tableau to extract insights from extensive datasets.Conducted research to evaluate fraud predictive analytics scenarios, developing models and analysing data to identify patterns and predict outcomes for new claims.Used Python, Pandas, NumPy, and SciPy for exploratory data analysis, data wrangling, and feature engineering.Evaluated Anomaly Detection Models, including Expectation Maximization, Elliptical Envelopes, and Isolation Forests, to enhance fraud detection capabilities.Extracted and analysed data from the Hadoop Distributed File System (HDFS) on Cloudera.Created a comprehensive Tableau Dashboard to present the organization's Annual Report.Explored kernel density estimation in lower-dimensional spaces as a predictive feature in fraud detection.Conducted multivariate analysis on safety programs spanning multiple years to uncover valuable insights.Leveraged regression analysis to establish correlations between participation in safety programs and claims outcomes.Executed hypothesis testing and rigorous statistical analysis to identify significant changes in claims following participation in safety programs.Managed the integration of fraud data with claims data, handling large datasets with multiple observations.Collaborated with fellow data scientists on various use cases, including workplace accident prediction and sentiment analysis, engaging with stakeholders to develop predictive models and drive data-driven decision-making.Sr. Data Scientist Sep 2016 Feb 2018ThirdEye Data, San Jose, CAAs a Senior Data Scientist with ThirdEye Data, I led the Investment Portfolio project, utilizing Natural Language Processing (NLP) and Time Series Analysis to revolutionize investment portfolio management. This initiative focused on enhancing predictive analytics for re-balancing stock portfolios and exploring various algorithmic trading theories and ideas. The project aimed to provide innovative solutions for more effective investment strategiesLeveraging Python modules for machine learning and predictive analytics.Implementing advanced machine learning algorithms using Spark, MLLib, R, and other relevant tools.Scripting in R, Java, and Python to perform data analysis and manipulation, ensuring data accuracy and quality.Developing data dictionaries to generate metadata reports for both technical and business requirements.Creating reporting dashboards to visualize statistical models and track key metrics and risk indicators.Utilizing ensemble models such as Random Forest to enhance model performance.Extracting and transforming data from MySQL, preparing complex data streams for analytical tools.Exploring various regression and ensemble models for forecasting and developing new financial models.Improving model efficiency and accuracy through rigorous evaluation and refinement in R.Defining source-to-target data mappings, establishing business rules, and refining data definitions.Conducting end-to-end Informatica ETL testing and crafting complex SQL queries for source and target database comparisons.Data Scientist Jan 2015 Aug 2016Target Corporation, MinneapolisAs a Data Scientist on the digital fulfillment team at Target Corporation, I specialized in model building to enhance e-commerce order tracking and streamline operations. My role involved supporting the Data Science team by developing machine learning models, ensuring efficient and reliable performance in our digital fulfillment processes.Architecting Data pipelines through scripts in SQL and generated optimal performance across production, QA, and development environments.Monitoring and managing data pipelines to ensure seamless execution and reliability throughout different stages.Identifying bottlenecks and optimizing script runtimes by leveraging historical metrics to drive significant performance enhancements.Scheduling scripts for dynamic execution intervals based on use cases and model requirements.Enhancing error handling through try/except blocks to swiftly replicate and isolate errors for efficient root cause analysis.Tracking, resolving, and documenting real-time errors to maintain a high level of system robustness.Orchestrating Jenkins pipelines using JSON to facilitate seamless integration and deployment.Collaborating closely with business stakeholders and data scientists from various teams to gather requirements and share weekly updates and findings.Developing and deploying end-to-end machine learning pipelines for retail analytics, automating the deployment of predictive models into production environments to ensure timely and seamless integration of data science solutions into retail systems.Working closely with cross-functional teams, including data scientists, software engineers, and business analysts, to understand retail domain requirements and align machine learning solutions with business goals.Facilitating effective communication between technical and non-technical stakeholders to drive the successful implementation of ML models in retail operations.Data Analyst Jan 2013 Dec 2014Alteryx, Inc., Irvine, CAUtilize Alteryx Designer to connect to various data sources, integrate, and prepare data for analysis. Design and implement data workflows to clean, transform, and enrich data from multiple sources.Perform detailed data analysis using Alteryxs advanced analytics tools. Develop statistical and predictive models to uncover insights and support business decisions.Create and customize interactive dashboards and reports using Alteryx and integrated visualization tools. Provide clear, actionable insights through data visualizations to stakeholders.Design, build, and optimize data workflows to improve efficiency and accuracy. Automate repetitive tasks and processes to streamline data analysis and reporting.Work closely with cross-functional teams, including data scientists, business analysts, and IT professionals, to understand business requirements and translate them into analytical solutions.Ensure data accuracy and integrity by performing data validation and quality checks. Address and resolve data quality issues to maintain reliable analysis.Leverage Alteryxs suite of tools, including Alteryx Designer, Alteryx Server, and Alteryx Connect, to perform data analysis and share insights. Provide training and support to team members on best practices and tool usage.EducationMaster of Science in Data Science Worcester Polytechnic Institute, Worcester, MABachelor of Engineering in Computer Engineering |