| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Street Address
+1 (646) 899- 4670 EMAIL AVAILABLE https://LINKEDIN LINK AVAILABLE EDUCATIONColumbia University New York, NYM.S. in Data Science, GPA: 4.0/4.0 Sep. 2022 - Feb. 2024 Relevant Courses: Natural Language Processing, Reinforcement Learning, Modern Recommendation Systems Wuhan University Wuhan, ChinaB.S. in Information Management and Information System, GPA: 3.85/4.0 (rank: 5/65) Sep. 2018 - Jun. 2022 Relevant Courses: Intelligent Systems, Machine Learning, Data Mining, C, Python, Data Structures, Databases Honors: Merit-based scholarship in 2019-2020 academic year, Merit Student Award in 2020-2021 academic year SKILLS AND TECHNOLOGIESProgramming Languages: Python, Java, C, R, SQL, HTML, CSS, Javascript, Shell, Matlab AI Frameworks: PyTorch, TensorFlow, Keras, Scikit-learn, XGBoost, Transformers, spaCy, NLTK, OpenAI Gym Development Frameworks: Flask, Django, Spring Boot, Bootstrap, Vue.js Statistics and Visualization Tools: SPSS, SAS, NumPy, SciPy, Tableau, PowerBI, D3.js, Echarts, ggplot, Matplotlib EXPERIENCEColumbia University New York, NYResearch Assistant June. 2023 - PresentConducted humor detection with PyTorch and fine-tuned LLM (Pythia, GPT2, RoBERTa) training on human-made parallel corpus (funny and corresponding unfunny), which outperformed that training on simple corpus by 7%. Expanded training set with GPT4-generated parallel corpus, improving the fine-tuned LLMs performance by 10%. Explored how LLMs intrinsic ability (perplexity) to detect humorous texts changes when its size grows. DataLynn GoogleMeta Apple Linkedin Microsoft Facebook Netflix AmazonNew York, NY Data Scientist Internship June. 2023 - Aug. 2023Wrote Python-based standard solutions for 30+ data challenges with Scikit-learn and Transformers, incorporating feature engineering, ensemble methods, regularization, dimensionality reduction, and hyperparameter tuning. Prompt engineered with commercial LLM (ChatGPT, Colossal-AI) for automated generation of comprehensive data challenges, covering a wide range of topics (e.g. fraud detection, recommendation system, sales prediction). Crawled and analyzed interview questions for data scientists and machine learning engineers within 10 industries. L'Oreal New York, NYMachine Learning Engineer Internship (Capstone) Jan. 2023 - May. 2023 Fine-tuned T5 models with PyTorch at GCP on WikiSQL and Spider datasets to translate spoken questions to executable SQL queries, gaining an execute accuracy score of 70% and an exact match score of 50%. Benchmarked text2SQL task with OpenAI Codex - code-davinci-002, achieving an execute accuracy score of 80% and an exact match score of 70%.Exploratorily analyzed the corpora with correlation analysis, frequency analysis, and distribution visualization of query commands and question words using Numpy and Matplotlib. Midea Group Foshan, ChinaBackend Development Internship Aug. 2021 - Nov. 2021 Built, maintained and monitored a website, which allows more than 100,000 users to apply for access and to get customized data views of large enterprise databases, with Python and Flask. Designed and built relational databases using MySQL, implemented reading and writing operations with SQLAlchemy, and enhanced access speed by caching frequently and previously retrieved data in Redis. Used Celery to asynchronously handle long-running retrieval tasks (getting users' permissions, querying to databases) and tasks with external dependencies (submitting users' applications, sending error messages by Email). Zhongnan University of Economics and Law Wuhan, China Research Assistant Jan. 2021 - Jun. 2021Identified rumors by training XGBoost, SVM, LSTM, BERT, and ERNIE with PyTorch, gaining 91% accuracy. Deployed a website named Weibo Rumor Detection in Python Flask, HTML, and CSS, achieving over 500 daily hits. Scraped Chinese official rumor texts and factual texts from Weibo Rumor Clarification Website using Python crawler. |