| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
EMAIL AVAILABLE Street Address EDUCATIONStevens Institute of TechnologyMaster of Engineering - Applied Artificial Intelligence Aug 2019 - May 2022 Hunan University of Science and TechnologyBachelor of Engineering - Electrical Engineering and Automation Sep 2014 - Jun 2018 WORK EXPERIENCEData Scientist InternVisionX LLC, San Jose, CA Jun 2023 - PresentCollaborated with team members to develop a Digital Human Intelligent Dialogue System, integrating advanced models such as Linly and Qwen, alongside visual models like SadTalker and Geneface++, to facilitate high-quality dialogues and visual generation. Implemented real-time speech recognition and video captioning features, enabling users to interact naturally with the digital human through voice.Deployed YOLO models on real-time web cameras in retail environments for shoplifting detection, significantly reducing financial losses. Utilized the ViT model to identify various plant diseases using the New Plant Diseases Dataset from Kaggle, achieving an impressive 99.47% test accuracy on 87K RGB images of healthy and diseased leaves. Actively participated in research on Anomaly Detection algorithms, contributing to the advancement of the team's knowledge and capabil- ities.Data Science InternGlobal AI, New York, NY Nov 2022 - Feb 2023Enhanced data collection efficiency by 15% through the development of a web scraper in Python, aggregating data from multiple sources for a comprehensive BI solution.Transformed raw crawler data to align with a pre-defined database schema, optimizing data for efficient query processing in a DB/DW environment.Constructed real-time data pipelines utilizing AWS technologies, including EMR, Glue, Kinesis, Redshift/Spectrum, and Athena, to streamline data processing.Designed and implemented an interactive data visualization dashboard and analytical framework in Power BI, enhancing data accessibility and insights.PROJECTSSan Francisco Crime Analysis in Apache Spark Nov 2022 - Feb 2023 Conducted a comprehensive analysis of San Francisco crime data using Apache Spark, processing large datasets to extract actionable insights. Employed various Spark components, including Spark SQL, RDDs, DataFrames, and Spark Streaming, to analyze crime data in real-time. Utilized Apache Hadoop's HDFS for efficient storage and processing of large data volumes within the Spark environment. Developed a multi-stage crime data analysis pipeline encompassing data cleaning, transformation, and aggregation processes. Built machine learning models using Spark's MLlib library to predict future crime trends and identify high-risk areas for criminal activities. Visualized analysis results using tools such as Matplotlib, Seaborn, and Plotly, effectively communicating insights to stakeholders. Natural Language Processing and Topic Modeling Jun 2022 - Nov 2022 Applied Natural Language Processing (NLP) and topic modeling techniques to analyze unlabeled watch reviews, deriving insights into consumer behavior and purchasing decisions.Executed data pre-processing techniques, including tokenization, stemming, and feature embedding with TF-IDF, to prepare review data for analysis.Employed unsupervised learning models, such as K-means clustering and Latent Dirichlet Allocation (LDA), to identify latent topics and cluster review data.Analyzed identified topics and extracted significant keywords, enhancing understanding of consumer insights derived from the data. Utilized Python libraries, including NLTK, Spacy, Gensim, and Scikit-learn, for text analysis, feature extraction, and model training. Developed a scalable pipeline for topic modeling and analysis, capable of processing large volumes of text data in real-time. Bank Customer Churn Prediction Jul 2022 - Aug 2022 Developed a predictive system for bank customer churn using supervised learning models, identifying customers at risk of leaving. Evaluated model performance using Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC) metrics, ensuring accuracy and effectiveness.Movie Recommendation Nov 2021 - Feb 2022Constructed a basic recommendation system to understand its underlying mechanics and functionality. Performed online analytical processing on a movie rating dataset using Spark SQL for in-depth analysis. Provided personalized movie recommendations using the Alternative Least Square (ALS) model, enhancing user experience. Utilized ParamGridBuilder for model fine-tuning, achieving an RMSE of 0.8798 on testing data. SKILLSProgramming Languages: Python, Java, SQL; Database: MySQL, Oracle, MongoDB; Big Data: Hadoop, Spark Data Analytics: A/B testing, Hypothesis Testing; Data Visualization: Tabulae, Power BI Machine Learning: TensorFlow, Keras, PyTorch, scikit-learn |