PopALM: Popularity-Aligned Language Models for Social Media Trendy Response Prediction
Erxin Yu, Jing Li, Chunpu Xu
TL;DR
The paper tackles trendy response prediction for social media, aiming to generate replies likely to garner many likes. It introduces PopALM, a popularity-aligned language model trained with reinforcement learning and a curriculum-learning-enhanced PPO framework to cope with noisy like-based rewards. The approach combines supervised fine-tuning, reward modeling, and PPO-based RL, augmented by three CL-PPO components: reward enhancement, reward ranking, and self-paced sampling. Experiments on a large-scale Weibo dataset show PopALM improves both automatic and human evaluation metrics across multiple backbones and PEFT methods, and the generated responses enhance downstream tasks like poll question generation and social emotion prediction. This work provides a practical method for aligning generative models with popularity signals in real-world social media contexts.
Abstract
Social media platforms are daily exhibiting millions of events. To preliminarily predict the mainstream public reaction to these events, we study trendy response prediction to automatically generate top-liked user replies to social media events. While previous works focus on generating responses without factoring in popularity, we propose Popularity-Aligned Language Models (PopALM) to distinguish responses liked by a larger audience through reinforcement learning. Recognizing the noisy labels from user "likes", we tailor-make curriculum learning in proximal policy optimization (PPO) to help models capture the essential samples for easy-to-hard training. In experiments, we build a large-scale Weibo dataset for trendy response prediction, and its results show that PopALM can help boost the performance of advanced language models.
