Data Scientist Software Engineer Resume ...

Data Scientist Software Engineer Resume ...
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Scientist Software Engineer
Target Location	US-TX-Austin
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Click here or scroll down to respond to this candidate
Candidate's Name
LinkedIn Profile Austin, TX, H: PHONE NUMBER AVAILABLE EMAIL AVAILABLEEducation Master of Science: Applied Mathematics and Statistics, Stony Brook University, New York 2016-2018 Coursework: Machine Learning, Forecasting, NLP, Algorithms, Big Data Analytics, Probability, Statistics, Linear Algebra, Differential Calculus. Bachelor of Engineering: Electronics and Communication Engineering, JNTU, India 2006-2010 Coursework: Signal Processing, Control Systems, Logic Design, Computer Networks, Microcontrollers, C++, and Data Structures. Work ExperienceData Scientist, AT&T - HBO Max, (consultant) - (Python, LangChain, LLM, AWS, Pandas, Numpy, PyTorch) 04/2023 to Now LLM powered Conversational AI: Personalized virtual Assistant for augmenting User Engagement. Transformer based Model Architecture: Developed LLM core using decoder, self and multi-head attention mechanisms and integrated layers for user input for text and voice modes, response and user feedback processes through Rest APIs connecting various content management systems. Pre-training and Fine-tuning: Leveraged existing foundation models like GPT for generic conversation scope and augmented it with HBO max specific datasets like user engagement, show transcripts, support logs, and content metadata. Iterated through various cycles of development balancing compute size and dataset size based on Chinchillas paper guidelines. Prompt Engineering and Langchain: Developed Zero-shot, one-shot and few-short learning techniques to adapt for new conversational domains with low training data for low latency responses using dynamic prompt libraries from LangChain. Evaluation Metrics and Alignment: Optimized the model for aligning with ethical guidelines and fair use policies, that also include reducing bias and hallucinations and finding a balance between depth and quality. Achieved a BLEU score of 62 and Rogue score of 80 after multiple iterations. Data Scientist, Spotify - (Python, Spacy, PyTorch, Docker, Kubernetes, GCP, Pandas, Numpy, and Tableau) 03/2021 to 02/2023 Playlist Curation with 91% accuracy using Recommender Systems, NLP, and Reinforcement Learning Algorithms using PyTorch NLP: Developed text-driven playlist models using multiple Transformer-based pretrained and customized NLP algorithms, including Text Classification on news articles, Entity Extraction from song lyrics, and Text Summarization of social blog content, to generate diverse and informative playlists. Unsupervised User Classification: Created weekly playlists by employing Self-organizing Maps (SOMs), Matrix Factorization, SVD and other Deep learning based unsupervised algorithms which utilized vector representations of user-profiles and metadata for personalized content. Reinforcement Learning: Utilized sophisticated Exploitation and Exploration techniques, combined with an Epsilon-greedy strategy employing multi- armed bandits to optimize the RL based objective function to enable the extraction of actionable insights Improved Podcast Advertising ROI (12%) through Ad Spend Optimizations using classical and Bayesian MMMs and MTA techniques. Classical Mixed Marketing Models: Improved Marketing strategy towards positive ROI by employing MMM techniques like Regression Analysis, Time Series Analysis (classical and recent), Machine Learning models including Decision Trees, Random Forests, and Boosting methods. Bayesian Mixed Marketing Models: Optimized marketing spend to supplement Nielsens and other third-party data segmentations by controlled experiments through Carryover, Geo-decay, Ad-stock transformation, and shape effect methods using MCMC, and other algorithms. Multi-Touch Attribution: Applied techniques such as Time-decay, position-based, linear, algorithmic, and customized models to understand strategies among Host-read ads, Programmatic and Direct Sold, Sponsorships, and Affiliate marketing for Streaming Ad-insertion and Dynamic Ad-insertion. Provided Business Insights using Inferential statistics and developed KPI dashboards using Tableau for sales and marketing teams, for day-to-day metric monitoring, feedback and insights recommendations. NLP Data Scientist, Bank of America (Python, Spacy, PyTorch, AWS, REST API, Flask, Pandas, OCR) 05/2019 to 02/2021 Multi-label Text Classification: Indexing unstructured financial documents with Deep Learning using PyTorch. Implemented a multi-label Text Classification to classify incoming customer documents into 15 categories with an ensemble of Logistic Regression, XGBoost, and SVM classifiers using Bag-of-words and TF-IDF, and Word2Vec techniques. Applied Deep Learning algorithms including Transformers, BERT, and its variants, in PyTorch with 96% accuracy starting with a baseline accuracy of 87% with Word2Vec and RNN-LSTM. Enhanced approaches in Data Preprocessing, Hypothesis Testing, EDA, Feature Engineering, Dimensionality Reduction, SMOTE, and Hyper-parameter optimization. Hierarchical Clustering using Dynamic Time Warping (DTW) for Time Series clustering. Created Clustering models with DTW as a distance metric using Tensorflow and Keras with GCP stack, to classify time series distributions of consumption products like music, podcast, and other combinations into clusters so they can be subsequently modeled using specific forecasting models for precise forecasts. Utilized Auto-Encoders for data compression in feature space using 1D-CNN and Bi-LSTMs to enable easy convergence. Data Scientist, JPMorgan Chase & Co. 07/2018 to 05/2019 Credit Risk Evaluation using Alternate Data using Decision Trees and Neural Networks in Python using Scikit-learn. Developed a classification model for predicting the probability of default using a fine-tuned ensemble of Logistic Regression, Decision Trees, and Neural Networks. By incorporating the sensitivity to loss given default rate to systemic risk, the model outperformed the existing Logistic Regression, Random Forest, and KNN models by 8% to reach 93% accuracy. Software Engineer, Citibank, India 02/2011 to 01/2016 Executed Market Basket Analysis by leveraging Association Rule Mining techniques, which facilitated the precise classification of customers and products, leading to a deeper understanding of their purchase patterns and preferences using Linear, Quantile, and Polynomial Regression Designed and built sophisticated Web Applications utilizing Python, Java, and SQL, while effectively implementing Data Warehouse models with SQL- Server, Oracle, Teradata, Informatica, and various ETL tools. Quantitative ModelingQuantitative Researcher, Stony Brook University, NY 01/2018 to 06/2018 Forecasting commodity prices using Multivariate Bayesian Regression. Built a quantitative model to predict commodity prices with technical and macroeconomic factors as the feature space, using the recursive feature engineering technique. Surpassing Linear Regression, AdaBoost, Decision Tree, Ridge Regression, and Support Vector Regression estimates, the Multivariate Bayesian Regression delivered a Goodness of the fit value of 0.996. Skills Programming: Python, Pandas, DBT, GCP, PyTorch, AWS, Unix, C, PostgreSQL, SQL, NoSQL, SQL Server, Django, HTML, and CSS. Data Science: Probability, Inferential Statistics, Linear Algebra, Advanced Calculus, Hadoop, and Big Data Analytics.
Respond to this candidate
Your Message
Please type the code shown in the image: