Comment Staytime Prediction with LLM-enhanced Comment Understanding
Changshuo Zhang, Zihan Lin, Shukai Liu, Yongqi Liu, Han Li
TL;DR
This work tackles staytime prediction in the comments section of short-video platforms, a less-explored signal for user engagement. It introduces KuaiComt, a large-scale real-world dataset, and proposes LCU, a two-stage framework that first fine-tunes an LLM on domain-specific comment tasks and then integrates LLM-derived embeddings with traditional staytime predictors through user-agnostic and user-specific ranking auxiliary tasks. The approach yields consistent gains in both staytime prediction and relevance ranking across multiple base models, validated by offline experiments on KuaiComt and online A/B testing on Kuaishou. The combination of LLM-based comment understanding and ranking signals demonstrates practical potential for improving recommendation systems and user experience in video platforms, with dataset and code openly released.
Abstract
In modern online streaming platforms, the comments section plays a critical role in enhancing the overall user experience. Understanding user behavior within the comments section is essential for comprehensive user interest modeling. A key factor of user engagement is staytime, which refers to the amount of time that users browse and post comments. Existing watchtime prediction methods struggle to adapt to staytime prediction, overlooking interactions with individual comments and their interrelation. In this paper, we present a micro-video recommendation dataset with video comments (named as KuaiComt) which is collected from Kuaishou platform. correspondingly, we propose a practical framework for comment staytime prediction with LLM-enhanced Comment Understanding (LCU). Our framework leverages the strong text comprehension capabilities of large language models (LLMs) to understand textual information of comments, while also incorporating fine-grained comment ranking signals as auxiliary tasks. The framework is two-staged: first, the LLM is fine-tuned using domain-specific tasks to bridge the video and the comments; second, we incorporate the LLM outputs into the prediction model and design two comment ranking auxiliary tasks to better understand user preference. Extensive offline experiments demonstrate the effectiveness of our framework, showing significant improvements on the task of comment staytime prediction. Additionally, online A/B testing further validates the practical benefits on industrial scenario. Our dataset KuaiComt (https://github.com/lyingCS/KuaiComt.github.io) and code for LCU (https://github.com/lyingCS/LCU) are fully released.
