Projects

Identifying Popular Products at An Early Stage for Apparel Industry

May 01, 2022

• Proposed a new indicator called AW Sales (Average Weekly Sales in Main Sales Period) to measure the popularity of a product, which is unaffected by the differences in store traffic, number of stores with initial stock, discounts and length of time the product has been launched.
• Constructed novel features such as the longest increasing subsequences derived from the sequence of weekly adjusted sales volume of product k across all stores within a typical region.
• Built a product popularity classification model for apprel field with LambdaMart ranking model, which is the first time the ranking algorithm has been applied to the sales prediction field. achieved a prediction accuracy of 78.9%, and identifies fast-selling products 17 days earlier than rule-based method.

Text mining and classifications in Yunduoduo (Team Project)

December 01, 2021

• Collected data from forum “Yunduoduo” and fine-tuned the BERT model for predicting the sentiment of posts.
• Compared the accuracy and CPU time of various machine learning algorithms, including Random Forest, Logistic Regression, and Linear Support Vector Machines; Proposed a automatic classification application prototype.

Time Series: Analysis of AQI of Delhi 2015-2020

November 01, 2021

Utilized R to model and predict Air Quality Index (AQI) for the 12-month period from August 2019 to August 2020, based on historical monthly air quality data from Delhi, India.

Plan-based bus bridging strategy during disruptions of urban rail networks.

February 01, 2021

• Constructed an Integer Linear Programming (ILP) model for “flexible scheduling” of buses to facilitate efficient passenger transportation during rail network disruptions. Developed a program using CPLEX to find the exact solution to the ILP model.
• Designed and implemented a heuristic algorithm in Python to expedite the solution process while consistently achieving optimal results. Introduced neighborhood actions in local search to enhance the algorithm’s performance, resulting in a solution that deviates by only 2.2% from the exact solution.