Table of Contents
Fetching ...

PopALM: Popularity-Aligned Language Models for Social Media Trendy Response Prediction

Erxin Yu, Jing Li, Chunpu Xu

TL;DR

The paper tackles trendy response prediction for social media, aiming to generate replies likely to garner many likes. It introduces PopALM, a popularity-aligned language model trained with reinforcement learning and a curriculum-learning-enhanced PPO framework to cope with noisy like-based rewards. The approach combines supervised fine-tuning, reward modeling, and PPO-based RL, augmented by three CL-PPO components: reward enhancement, reward ranking, and self-paced sampling. Experiments on a large-scale Weibo dataset show PopALM improves both automatic and human evaluation metrics across multiple backbones and PEFT methods, and the generated responses enhance downstream tasks like poll question generation and social emotion prediction. This work provides a practical method for aligning generative models with popularity signals in real-world social media contexts.

Abstract

Social media platforms are daily exhibiting millions of events. To preliminarily predict the mainstream public reaction to these events, we study trendy response prediction to automatically generate top-liked user replies to social media events. While previous works focus on generating responses without factoring in popularity, we propose Popularity-Aligned Language Models (PopALM) to distinguish responses liked by a larger audience through reinforcement learning. Recognizing the noisy labels from user "likes", we tailor-make curriculum learning in proximal policy optimization (PPO) to help models capture the essential samples for easy-to-hard training. In experiments, we build a large-scale Weibo dataset for trendy response prediction, and its results show that PopALM can help boost the performance of advanced language models.

PopALM: Popularity-Aligned Language Models for Social Media Trendy Response Prediction

TL;DR

The paper tackles trendy response prediction for social media, aiming to generate replies likely to garner many likes. It introduces PopALM, a popularity-aligned language model trained with reinforcement learning and a curriculum-learning-enhanced PPO framework to cope with noisy like-based rewards. The approach combines supervised fine-tuning, reward modeling, and PPO-based RL, augmented by three CL-PPO components: reward enhancement, reward ranking, and self-paced sampling. Experiments on a large-scale Weibo dataset show PopALM improves both automatic and human evaluation metrics across multiple backbones and PEFT methods, and the generated responses enhance downstream tasks like poll question generation and social emotion prediction. This work provides a practical method for aligning generative models with popularity signals in real-world social media contexts.

Abstract

Social media platforms are daily exhibiting millions of events. To preliminarily predict the mainstream public reaction to these events, we study trendy response prediction to automatically generate top-liked user replies to social media events. While previous works focus on generating responses without factoring in popularity, we propose Popularity-Aligned Language Models (PopALM) to distinguish responses liked by a larger audience through reinforcement learning. Recognizing the noisy labels from user "likes", we tailor-make curriculum learning in proximal policy optimization (PPO) to help models capture the essential samples for easy-to-hard training. In experiments, we build a large-scale Weibo dataset for trendy response prediction, and its results show that PopALM can help boost the performance of advanced language models.
Paper Structure (33 sections, 5 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 5 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: A Weibo post about "Volunteer Leaked Exam Questions", followed by its top-3 trendy responses with like numbers. The bottom presents a response sample generated by ChatGPT.
  • Figure 2: The workflow of PopALM is based on curriculum learning enhanced PPO, which exploits three novel strategies to leverage noisy user-like labels as popularity indicators. These strategies are Reward Enhancement (left bottom; for task-specific supervision), reward ranking (right bottom; for filtering noisy training samples), and self-paced reward sampling (right top; for training from easy to hard).
  • Figure 3: Distribution of response frequency (y-axis) over like numbers (x-axis). Red bars correspond to the top 50% more popular responses and the rest are blue.
  • Figure 4: Effects of training data scales (x-axis). The y-axis shows the ROUGE-L score of the top-3 prediction based on ChatGLM. The colored bands indicate $\pm$1 standard deviation corresponding to different percentages of training data.
  • Figure 5: Ablation study on CL-PPO. We report the ROUGE-L scores of the Top-3 trendy response predictions for GPT-2, LLaMA, and ChatGLM. For them each, we show PEFT results of LoRA on the left and P-Tuning on the right. For each barplot, the bars from left to right show PPO, CL-PPO, followed by the CL-PPO ablations w/o Reward Enhancement, w/o Reward Ranking, and w/o Self-paced Sampling.