Concepedia

Publication | Closed Access

Catboost-based Framework with Additional User Information for Social Media Popularity Prediction

43

Citations

1

References

2019

Year

Abstract

In this paper, a Catboost-based framework is proposed to predict social media popularity. The framework is constituted by two components: feature representation and Catboost training. In the component of feature representation, numerical features are directly used, while categorical features are converted into numerical features by a method of order target statistics in Catboost. Besides, some additional user information is also tracked to enrich the feature space. In the other component, Catboost is adopted as the regression model which is trained by using post-related, user-related and additional user information. Moreover, to make full use of the dataset for model training, a dataset augmentation strategy based on pseudo labels is proposed. This strategy involves in two-stage training. In the first stage, it trains a first-stage model that is used to label the test set as pseudo labeled. In the next stage, a final model is trained based on the new training set that includes original validation set and the pseudo labeled test set. The proposed method achieves the 2nd place in the leader board of the Grand Challenge of Social Media Prediction.

References

YearCitations

Page 1