Machine Learning Data Scientist Resume H...

Machine Learning Data Scientist Resume H...
Resumes | Register
Candidate Information
Title	Machine Learning Data Scientist
Target Location	US-OR-Hillsboro
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Click here or scroll down to respond to this candidate
Tapan ChLead/Sr. Data Scientist Machine Learning EngineerPHONE NUMBER AVAILABLEEMAIL AVAILABLEhttps://LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY:Professional qualified Data Scientist with experience in Machine Learning/Deep Learning and Analytics including Data Mining, Deep Learning, Natural Language Processing (NLP), Recommendation Systems, Statistical Analysis and Computer VisionInvolved in the entire data science project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.Experienced in application development in R, Data visualization, R Shiny, Reporting, Web Scrapping.Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.Experienced in implementing various NLP libraries and techniques like NLTK, TF-IDF, Lemmatization, Stemming, Count Vectorizer, and Word2Vec.Experienced with Gen AI, AI Machine learning algorithm such as logistic regression, KNN, SVM, random forest, neural network, linear regression, Nave Bayes, lasso regression and k-meansExperienced in implementing Computer Vision using OpenCV, YOLO, PIL and used various deep learning libraries like Keras, OpenCV, PyTorch to extract the features from the imageImplemented Bagging and Boosting to enhance the model performance.Experienced in deploying deep learning models and neural networks on GPU using TensorFlow-gpu and NVIDIA Tensor Cores.Experienced in various Microsoft Azure services like Azure DevOps, Azure Data Factory (ADF), Azure Data Pipeline and Logic Apps.Experienced in implementing Computer Vison using various Gen AI, AI ML libraries and algorithms such as OpenCV, YOLO, Keras, TensorFlow, PIL (Python Image Library) and Haar Cascade (xml Haar files)Experienced in writing test cases in Python using SeleniumExperience in implementing data analysis with various analytic tools, such as Anaconda Jupiter Notebook 4.X, R 3.0 (ggplot2, dplyr, Caret) and ExcelUsed Federated Learning to train the machine learning models on client machine and received the output on the local server.Implemented Recommendation Systems mainly using Collaborative Filtering and Content Based filtering.Experienced in implementing Language Model and Word Embeddings using Word2Vec and GloveExperienced in web development using Python like Django, REST and SOAP ApisAbility to maintain a fun, casual, professional and productive team atmosphereExperienced the full software life cycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.Worked on container-based technologies like Docker and Kubernetes.Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Scipy, Numpy and Pandas for data analysis.Expertise in transforming business requirements into designing algorithms, analytical models, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.Skilled in performing data parsing, data manipulation, data architecture, data ingestion and data preparation SAS Data Flux with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, merge, Remap, subset, reindex, melt and reshape.Worked with NoSQL Database including HBase, Cassandra and MongoDB.Experienced in Big Data with Hadoop, MapReduce, HDFS, Apache Airflow and Spark.EDUCATIONMaster of Science  Information Technology (2017)Bachelor of Engineering  Computer Science (2012)TOOLS AND TECHNOLOGIESBig Data/Hadoop TechnologiesApache Spark, Airflow, Hive, Hadoop, Databricks, AWS, SaaS.LanguagesPython, Scala, SQL, JavaScript, HTML, CSS, R Shiny.NO SQL DatabasesCassandra, HBase, MongoDB.Data Analysis, Visualization and Business Intelligence ToolsPowerBI, QlikView, Amazon Redshift, or Azure Data Warehouse, R (OOP, dplyr, ggplot2, ggvis, Shiny, Shiny dashboard), Adv. Excel (PivotTables & Charts, Lookups, Solver, Complex formulas, Data Analysis), Python, SAS, Tableau (integration with R).Development ToolsJupyter Notebook, Anaconda Studio, PyCharm, Google Colab, Microsoft SQL Studio, IntelliJ, Eclipse, Net Beans, AWS, Sagemaker, Redshift.Development MethodologiesAgile/Scrum, UML, Design Patterns, Waterfall.Build ToolsApache Airflow, Jenkins, Toad, Control-M, Oozie, Informatica.Reporting ToolsMS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos 7.0/6.0.DatabasesMicrosoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza.Operating SystemsWindows, UNIX, LINUX, Mac OS.Professional Experience:Client: AT&T, Plano, TX Jan 2019  Till DateRole: Lead Data ScientistDescription: AT&T is a Fortune 10 American multinational conglomerate company with headquarters in Dallas, TX. The company primarily focuses on mobile telephone services and media entertainment through Warner Media.Responsibilities:Worked on project to take in quality data from various sources to predict failures on machines based on prior events using Python, R Shiny and AI Machine learning, the data was obtained from telematic devices (IoT devices) located all over the world.Developed classification models using rules-based and deep learning algorithms to determine failures in machines.Designed and implemented ETL pipelines using AWS Glue to move and transform data between various sources, such as S3, RDS and Redshift, improving the data processing time by 30%Performed deep dive analysis, leveraging statistical analysis, exploratory data analysis and build dashboard and present the results.Build classification model using tree-based models like decision trees, random forests and leveraged boosting techniques like XGBoost, Gradient Boost, AdaBoost.Developed classification models using rules-based and deep learning algorithms using PyTorch, tensor flow to determine fraudulent phone upgrades.Collaborated with cross-functional teams to design and execute end-to-end NLP solutions using LLMs, contributing to improved text generation, sentiment analysis, and language translation capabilities.Successfully integrated LangChain's proprietary platform into the existing infrastructure, enabling seamless data processing and analysis for diverse linguistic contexts.Conducted extensive experiments and fine-tuning of LLMs for domain-specific tasks, achieving a significant reduction in model adaptation time and enhancing accuracy by 15%.Data preprocessing and feature exploration of multiple datasets and building Chatbots using NLTK.NLP.Extracted text data from chat conversations and applied NLP techniques like bag of words, tfidf, stemming and lemmatizing, word to vec, topic modeling to analyze text data and prepare data to generate insights from the text data.Developed and fine-tuned Large Language Models (LLMs) such as GPT and BERT for various NLP applications, achieving state-of-the-art performance on text generation, summarization, and sentiment analysis tasks.Used Kubernetes to orchestrate the deployment, scaling, and management of Docker Containers.Designed and implemented ETL (Extract, Transform, Load) processes using Teradata utilities to integrate and transform data from diverse sources for downstream analytics.scalable, and economical Gen AI solutions that lie at the intersection of Data Science/Machine Learning, Data Management, and SoftwareImplemented transfer learning techniques to adapt pre-trained LLMs to domain-specific datasets, improving model accuracy by up to 15%.Optimized SQL queries and data processing workflows on Teradata for enhanced performance, leveraging indexing, partitioning, and other optimization techniques.Created the build scripts using Maven for Java projects. Automating the build process by configuring GIT for projects.Strong proficiency in the implementation, integration, and management of large language models, with a understanding of their capabilities and potential applications.Utilized Teradata's MPP architecture to parallelize analytical queries, improving processing speed and efficiency for large datasets.Applied advanced analytics techniques using Teradata Aster Analytics platform, including machine learning algorithms, for predictive modeling and insights generation.Spearheaded chatbot development initiative to improve customer interaction with application and developed the chatbot using api.ai.Built advanced NLP models like Named Entity recognition NER for generating text labels and used those as features to train classification modelsAWS Sagemaker to perform activities like data preparation, model training, model evaluation and model deploymentBuild text classification models using the text data by cleaning the text and build text corpus that could be used to train modelsDeveloped AWS Glue Crawlers to automatically discover, catalog and classify structured and semi structured data.Used API calls to generate the required data from various sources and perform data transformation.Build Deep Learning CNN model for identifying device installations thus helping with maintenance routines and reduce operational costsFlattened and extracted data from unstructured sources like NoSQL, MongoDB for analyzing, building data pipelines and generate featuresleveraged Redshift to store large data sets, integration and analyzing data and collaborative workUtilized AWS Glue Data Catalog as a centralized metadata repository to store and manage table definitions and schemas, reducing maintenance overheadDevelop real-time, embedded Python software to decode, interpret, and assemble the raw neural network outputs into a form consumable by the planning and control stack.Implemented data transformation using PySpark in AWS Glue, and leveraged AWS Glue Triggers and job scheding features to automate ETL workflows ensuring timely data updates for applications and reportingOptimized Glue ETL jobs using partitioning, bucketing and caching strategies, resulting in 25% reduction in job executing time and cost.Environment: Python 3, AWS, Glue, NLP, AWS Sagemaker Studio, Snowflake, Tableau, ETL, Databricks, Spark, Keras, Git, OpenCV, PyTorch, tensor flow, IoT, Azure DevOps, Kubernets, Pandas, Computer Vision, Selenium, Gen AI, AI Machine learning algorithms.Client: Baxter, Deerfield, IL. May 2018 - Jan 2019Role: Sr. Data ScientistDescription: Baxter International Inc. is a Fortune 500 American health care company with headquarters in Deerfield, Illinois. The company primarily focuses on products to treat hemophilia, kidney disease, immune disorders and other chronic and acute medical conditions.Responsibilities:Used R Shiny to create interactive dashboards which can be later used on web applications by using HTML and CSS.Built a risk analytics web application to analyze the credit and investment risk of an Enhanced Equipment Certificate Trust using R (Shiny, Shiny dashboard and Rhandsontable) and JavaScript.Design and seamless integration of tailored Gen AI solutions within comprehensive full-stack applicationsDeployed application on the Microsoft Azure platform using a Shiny Server running on a Docker installed on an Ubuntu VM.Updated an R package which downloaded, and processed real time derivatives data from various repositories.Test and determine whether new technology is potentially useful for USAAExtracting Operations data from Workday and storing it in HadoopDeveloped a proof-of-concept collaborative filtering recommendation engine using Scikit-LearnBuild a graphic database using DataStax in PythonQuery data/information from graphic database using Gremlin to support model validation.Work in a group to develop Machine Learning guidance for model development and validation.Worked on Tableau to visualize the data of customers and made reports.Used graph databases to find the relationship between the various customersDraft the procedures for reporting.Worked on ETL tools such as Apache Airflow, Jenkin.Conducted Data Modeling analysis to identify trends and patterns in supply chain data, resulting in a 15% reduction in inventory waste and a 10% increase in on-time deliveries.Created and deployed Kubernetes pod definitions, tags, labels, multi-pod container replication. Managed multiple Kubernetes pod containers scaling, and auto-scaling.Managed the GIT using Nexus tool to automate the build process and used the same to share the snapshots and releases of internal projects.Developed predictive Models using Machine learning algorithms to forecast demand and Optimize production schedules, resulting in a 20% reduction in production costs and a 5% increase in overall revenueCollaborated with cross-functional teams to integrate NLP solutions into production systems, resulting in a 20% reduction in manual effort for text analysis tasks.Developed and fine-tuned large language models (GPT-3, T5) for text generation, summarization, and translation tasks, resulting in a 20% increase in task accuracy.Collaborated with cross  functional teams to design and implement data-driven solutions to business problems, resulting in increased efficiency and cost savings across the organizationConducted A/B testing and analysis to evaluate the impact of new business initiatives and recommend areas for improvementCreated data visualizations and dashboards to communicate insights and finding to stakeholders, resulting in increased buy-in and adoption of data-driven decision makingBuild an auto DAG system which would automatically trigger the workflow whenever the Airflow will receive HTTP request.Write python scripts to parse documents.Did intensive research on the tools like Graph Networks, Scheduling Tools, SQL IDEs and Data Wrangling Tools to determine the best tool available in the market that would be best fit for companys need.Took online courses as a means to continuously learn new subjects.Environment: Windows, Python 3, Tableau, R Shiny, Shiny Dashboard, DataStax, GraphX, NLP, Azure DevOps, Git, Kubernetes, NLTK, Apache Airflow, Jenkin, SQL Jupyter Notebook.Client: DHHS South Carolina Mar 2017 - April 2018Role: Data ScientistDescription: The United States Department of Health and Human Services is a cabinet-level executive branch department of the U.S. federal government created to protect the health of the U.S. people and providing essential human services.Responsibilities:Co-development of Data Validation Framework to generate analysis and visualization report.Utilized SPSS and Minitab statistical software to track and analyze data.Optimized data collection procedures and generated reports on a weekly, monthly and quarterly basis.Used Advanced Microsoft Excel to create pivot tables and povot reporting as well as use VLOOKUP function.Used Tableau and R Shiny for data visualization and create reports.Developed R scripts for estimating the probability of default of an Enhanced Equipment Certificate Trust between cash flows based on issuers rating.Built an end-to-end sentiment analysis pipeline using BERT, achieving an accuracy of 88% on a large-scale social media dataset.Implement severity report notification (Email, SMS) to clients for on-time data loading and table validationBenchmark visual trending reports in order to provide quantified analysis to clients report.Involved in gathering requirements from client and estimating a timeline for developing complex queries using Hive for logistics applications.Created hive schemas using performance techniques like partitioning and bucketing.Involved in maintaining the infrastructure on AWS Cloud.Worked on several independent projects in Supervised/ Unsupervised/ Reinforcement learning, Computer Vision, Time Series, Deep Learning and Natural Language Processing (NLP) using Python.Developed an ETL pipeline using Apache Spark, Jenkin and deployed it on AWS cloud.Experienced in deploying machine learning models into production.Developed a spam/non-spam classifier using NLP. The data was obtained in the form of raw text file which was first transformed into a Pandas data frame. Then later various NLP techniques like Lemmatization, Stemming, TF-IDF to clean the text. This text was later feeded to machine learning algorithms to predict the spam and non-spam emails.Built a customer segmentation project using clustering algorithms such as K-Means Clustering and Gaussian Mixture Model.Developed various interactive dashboards using R Shiny.Experienced in using workflow management tools such as Apache Airflow and Oozie.Experienced in using Python frameworks such as Flask for micro services and Django for web development.Experienced in using Spark transformations using Databricks.Environment: Python, Databricks, AWS, R Shiny, Apache Airflow, MacOS/Ubuntu, NLP, Tableau, Flask, Django, Numpy, Pandas, Clustering, XGBoost, SVM, Keras, PyTorch, Computer Vision, NLP, Pivot Tables, Jenkin.Client: Bank of New York Mellon, NYC July 2016- Feb 2017Role: Sr. Data AnalystDescription: Bank of New York Mellon is one of the secured financial institutions that serves huge domain of customers. Bank offers various financial and banking services to its customers. The current application is a part of online banking that allows a customer to pay bills securely as well as enroll in monthly automatic recurring bill payment. The functionalities involved in e-bill payment are add a payee, make a payment, set up automatic payments, receive bills electronically, request e-mail notifications and review payments.Responsibilities:Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, Natural Language Processing (NLP), RegEx, RNN, LSTM, supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.Sentiment Analytics engine was completely built in Python3. Email chain was considered foranalysis. Emails were pulled into DB from a tool Email2DB.Each email transactions were pre-processed, sentiment calculation, subjectivity, polarity was takenand linked to the CSAT score for dashboarding.Emite was used as a tool of choice for visualization.RapidMiner and Prediction IO was used as tool of choice for predictive analysis.Used Azure DevOps components like Azure Pipelines for Testing, CI/CD and Deployment.ITSM data was taken and build a engine for predicting CPU failure, Memory Issue, Disk SpaceIssue, Server failure Errors.Training data was built from either of proactive and reactive ticket data which was in turn to beused to make Classification Model for prediction.Identifying and evaluating potential vendors for technology fitments, performing proof of value exercise, conducting commercial and contract negotiations.Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.Implemented Word Embedding using Word2Vec, Gensim and GloveCategorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text AnalyticsPerformed Multinomial Logistic Regression, Decision Tree, Random Forest, SVM to classify package is going to deliver on time for the new route.Performed Data Cleaning, features scaling, features engineering using pandas and Numpy packages in python.Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Tensorflow and PyTorchDeveloped Spark/Scala, R Shiny, Jenkin, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.Used clustering technique K-Means to identify outliers and to classify unlabeled data.Communicated the results with operations team for taking best decisions.Collected data needs and requirements by Interacting with the other departments.Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2014, Microsoft Excel, Apache Airflow, Jenkin, SSIS, R Shiny, MATLAB, Spark SQL, Azure, PySpark, Jupyter Notebook.Client: Sun Pharmaceutical, Mumbai, India Nov 2013 - Aug 2015Role: Data Analyst/ScientistDescription: Sun Pharmaceutical industries Limited is an Indian multinational pharmaceutical company head quarted in Mumbai, Maharashtra that manufactures and sells pharmaceutical formulations and active pharmaceutical.Responsibilities:Implemented the database to store the questions, possible answers, correct answers, Scores of users and queried using SQL.Used SPSS to track and analyze data.Created visually impactful dashboards in Excel for data reporting by using pivot tables.Extracted, interpreted and analyzed data to identify key metrics and transform raw data into meaningful, actionable information.Collected and analyzed data from different data sources and used it for analysis.Prepared reports that interpret consumer behavior, marketing results and trends.Queried MYSQL database queries, MySQL connector and MySQL DB package to retrieve the information.Created and optimized diverse SQL queries to validate accuracy of data to ensure database integrity.As SQL Server Developer worked closely with Application Developers to ensure proper design and implementation of database systems.Environment: SQL, SPSS, Excel, MySQL DB, Oracle 11g.Client: Hidden Bains, Hyderabad, India June 2012 - Oct 2013Role: Data AnalystDescription: Hidden Brains Infotech Pvt. Ltd is an Enterprise Web & Mobile Apps Development Company. With an industry experience of over a decade, we offer a plethora of client-centric services by enabling customers to achieve competitive advantage through flexible and next generation global delivery models.Responsibilities:Responsible for development of an application from scratch.Developed web pages using JSP and ServletsCreated GUI using HTML, CSS, JSP, JavaScript and jQueryDeveloped and implemented Servlets and Java Beans.Design and built signup and login pages using HTML and JavaScript and used pared to save user information to the database.Responsible for creating, updating, reading and deleting tables in the database as per requirements.Involved in testing of servlets and JSP using JunitInvolved in connecting database with Servlet and JDBCEnvironment: Core Java, SQL, HTML, CSS, JavaScript, Apache Tomcat, Servlet, Eclipse, Junit, JSP, JQuery, JDBC, Windows.
Respond to this candidate
Your Message
Please type the code shown in the image: