HOTVCOM: Generating Buzzworthy Comments for Videos
Yuyan Chen, Yiwen Qian, Songzhou Yan, Jiyuan Jia, Zhixu Li, Yanghua Xiao, Xiaobo Li, Ming Yang, Qingpei Guo
TL;DR
This work tackles the problem of generating hot comments for Chinese short videos by introducing HotVCom, the largest Chinese video hot-comment dataset, and the ComHeat framework that integrates visual, audio, and textual signals. ComHeat combines supervised fine-tuning, reinforcement learning with a reward model, and a knowledge-enhanced Tree-of-Thought to produce engaging, widely liked comments, guided by a comprehensive evaluation metric capturing informativeness, relevance, creativity, and user engagement. Empirical results show ComHeat outperforms diverse baselines on HotVCom and other video-comment tasks, with demonstrated cross-linguistic effectiveness on English TikTok data. The study contributes broadly useful datasets, evaluation protocols, and a scalable, multi-modal method for hot-comment generation with potential marketing and branding impact on video platforms, and points to future work in ethics, fairness, and cross-lingual expansion.
Abstract
In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose. However, existing research predominantly focuses on generating descriptive comments or ``danmaku'' in English, offering immediate reactions to specific video moments. Addressing this gap, our study introduces \textsc{HotVCom}, the largest Chinese video hot-comment dataset, comprising 94k diverse videos and 137 million comments. We also present the \texttt{ComHeat} framework, which synergistically integrates visual, auditory, and textual data to generate influential hot-comments on the Chinese video dataset. Empirical evaluations highlight the effectiveness of our framework, demonstrating its excellence on both the newly constructed and existing datasets.
