Live, travel, advanture, bless
and don't be sorry

Keep your wish firmly and head straight towards your destination
so you will never lose your way
Focus on your great journey, and you will discover
between the sun rises and the moon sets
you have seized every opportunity to approach your goal.

Data Scientist / Ph.D.
3-year experience
of Machine Learning, NLP

I have three years of professional experience as a Data Scientist, during which I have worked on various projects involving machine learning, natural language processing (NLP), and data analysis. Most recently, I developed a chatbot for a major media company, where I focused on improving its inclusiveness and robustness. I utilized NLP models, such as HuggingFaceEmbedding and FAISS in the langchain library, and applied multiple tools to enhance the chatbot's performance.

In addition to my practical experience, I hold a Ph.D. in Chemical and Bio Engineering from the University of British Columbia. My previous experience includes working as a research scientist/assistant in the field of water treatment. During my time as a research scientist, I had the opportunity to work with machine learning and was inspired by its potential to solve complex problems. This led me to explore the field of data science, and enjoy my current role at Beam Data.

Experience is wealth. I believe that my past experiences have made me more aware of myself and pushed me to keep getting better.

Project Showcase

Well experience with regression, classification and NLP models

Online Shopping Purchase Prediction - Regression Model

This project applied a XGBoost Regression model, which predict the purchase possibility based on customer's online shopping behavior. In addtion, a recommendation model including both collaborative filtering and content based filtering was built using customer purchase transaction data.

Product Review - Topic Clustering and Classification

This project included webscraping, clustering and classification models. For ML models, KMeans is applied to cluster 180 course types into 9 distinct segments; the XGBClassifier is applied to predict course types to complete the dataset for further analysis.The aim is to evaluate course review for an online education platform.

Sentiment Analysis - NLP and Classification

This project was run in DataBricks using spark to analyze the recent news in 'cancer' for sentiment evaluation. The goal of this project is to apply traditional NLP like tokenization, stopwords, CV and TF-IDF, N-grams for text analysis. Also, this project applied tools like AWS S3, athena, QuickSight etc. to address big data.

Skills and Qualifications

Programming Language: SQL • Python (Pandas, NumPy, SciPy)
Machine Learning: Scikit-Learn • Classification • Regression • Clustering • PCA • SparkML
NLP • LangChain • HuggingFace • Keras • TensorFlow • PyTorch • LFQA
Modeling: XGBoost • RandomForest • Logistic Regression • KNN • LightBGM • SVM
Linear Regression • polynomical Regression • Neural Networks (Multi-Layer Perceptron)
Big Data & Processing: MetaData, Git • AWS (Kinesis, EC2, S3, EMR, Athena, QuickSight) • DataBricks
Data Visualization: Seaborn • Plotly • Matplotlib • Tableau • Excel