Well experience with regression, classification and NLP models
Online Shopping Purchase Prediction - Regression Model
This project applied a XGBoost Regression model, which predict the purchase possibility based on customer's online shopping behavior. In addtion, a recommendation model including both collaborative filtering and content based filtering was built using customer purchase transaction data.
Product Review - Topic Clustering and Classification
This project included webscraping, clustering and classification models. For ML models, KMeans is applied to cluster 180 course types into 9 distinct segments; the XGBClassifier is applied to predict course types to complete the dataset for further analysis.The aim is to evaluate course review for an online education platform.
Sentiment Analysis - NLP and Classification
This project was run in DataBricks using spark to analyze the recent news in 'cancer' for sentiment evaluation. The goal of this project is to apply traditional NLP like tokenization, stopwords, CV and TF-IDF, N-grams for text analysis. Also, this project applied tools like AWS S3, athena, QuickSight etc. to address big data.
Skills and Qualifications
Programming Language: SQL • Python (Pandas, NumPy, SciPy)
Machine Learning: Scikit-Learn • Classification • Regression • Clustering • PCA • SparkML
NLP • LangChain • HuggingFace • Keras • TensorFlow • PyTorch • LFQA
Modeling: XGBoost • RandomForest • Logistic Regression • KNN • LightBGM • SVM
Linear Regression • polynomical Regression • Neural Networks (Multi-Layer Perceptron)
Big Data & Processing: MetaData, Git • AWS (Kinesis, EC2, S3, EMR, Athena, QuickSight) • DataBricks
Data Visualization: Seaborn • Plotly • Matplotlib • Tableau • Excel