Table of Contents
Fetching ...

When Top-ranked Recommendations Fail: Modeling Multi-Granular Negative Feedback for Explainable and Robust Video Recommendation

Siran Chen, Boyu Chen, Chenyun Yu, Yi Ouyang, Cheng Lei, Chengxiang Zhuo, Zang Li, Yali Wang

TL;DR

<3-5 sentence high-level summary> Addressing negative feedback in video recommendations, the paper introduces TVNF and the Agentic Explainable Negative Feedback (ENF) framework, a three-agent system (Profile, Video, Reason) that leverages multimodal video analysis and psychographic profiling to predict negative feedback and generate explanations. It proposes S-GRPO, a progressive reinforcement learning strategy, to train agents from easy to hard tasks and improve explainability. Empirical results on TVNF show improved negative-feedback prediction and reason classification, with notable gains over GPT-4o and competitive results on implicit feedback. Real-world deployment on Tencent News demonstrates significant improvements in watch time, fast-skip, and dislike rates, validating practical impact and robustness.

Abstract

Existing video recommendation systems, relying mainly on ID-based embedding mapping and collaborative filtering, often fail to capture in-depth video content semantics. Moreover, most struggle to address biased user behaviors (e.g., accidental clicks, fast skips), leading to inaccurate interest modeling and frequent negative feedback in top recommendations with unclear causes. To tackle this issue, we collect real-world user video-watching sequences, annotate the reasons for users' dislikes, and construct a benchmark dataset for personalized explanations. We then introduce the Agentic Explainable Negative Feedback (ENF) framework, which integrates three core components: (1) the Profile Agent, extracting behavioral cues from users' historical data to derive psychological and personality profiles; (2) the Video Agent, performing comprehensive multimodal video analysis; and (3) the Reason Agent, synthesizing information from the other two agents to predict user engagement and generate explanations. Additionally, we propose the S-GRPO algorithm, enabling the model to progressively address complex tasks during reinforcement fine-tuning. Experimental results on the collected dataset show that our method significantly outperforms state-of-the-art baselines in negative feedback prediction and reason explanation. Notably, it achieves an 8.6% improvement over GPT-4o in reason classification. Deployment on the business platform further validates its benefits: increasing average user watch time by 6.2%, reducing the fast-skip rate by 9.4%, and significantly enhancing user satisfaction.

When Top-ranked Recommendations Fail: Modeling Multi-Granular Negative Feedback for Explainable and Robust Video Recommendation

TL;DR

<3-5 sentence high-level summary> Addressing negative feedback in video recommendations, the paper introduces TVNF and the Agentic Explainable Negative Feedback (ENF) framework, a three-agent system (Profile, Video, Reason) that leverages multimodal video analysis and psychographic profiling to predict negative feedback and generate explanations. It proposes S-GRPO, a progressive reinforcement learning strategy, to train agents from easy to hard tasks and improve explainability. Empirical results on TVNF show improved negative-feedback prediction and reason classification, with notable gains over GPT-4o and competitive results on implicit feedback. Real-world deployment on Tencent News demonstrates significant improvements in watch time, fast-skip, and dislike rates, validating practical impact and robustness.

Abstract

Existing video recommendation systems, relying mainly on ID-based embedding mapping and collaborative filtering, often fail to capture in-depth video content semantics. Moreover, most struggle to address biased user behaviors (e.g., accidental clicks, fast skips), leading to inaccurate interest modeling and frequent negative feedback in top recommendations with unclear causes. To tackle this issue, we collect real-world user video-watching sequences, annotate the reasons for users' dislikes, and construct a benchmark dataset for personalized explanations. We then introduce the Agentic Explainable Negative Feedback (ENF) framework, which integrates three core components: (1) the Profile Agent, extracting behavioral cues from users' historical data to derive psychological and personality profiles; (2) the Video Agent, performing comprehensive multimodal video analysis; and (3) the Reason Agent, synthesizing information from the other two agents to predict user engagement and generate explanations. Additionally, we propose the S-GRPO algorithm, enabling the model to progressively address complex tasks during reinforcement fine-tuning. Experimental results on the collected dataset show that our method significantly outperforms state-of-the-art baselines in negative feedback prediction and reason explanation. Notably, it achieves an 8.6% improvement over GPT-4o in reason classification. Deployment on the business platform further validates its benefits: increasing average user watch time by 6.2%, reducing the fast-skip rate by 9.4%, and significantly enhancing user satisfaction.

Paper Structure

This paper contains 24 sections, 5 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: User negative feedback in a real recommendation scenario. Traditional method recommends a food-sharing video to a user who loves food-related themes based on high embedding similarity, while this triggers strong negative feedback. Our ENF framework successfully predicts the reason for user's negative feedback, avoiding similar recommendations in the future.
  • Figure 2: Overview of our Agent-based ENF framework. The three agents collaborate together, the Profile Agent analyze the user behaviors to get more comprehensive profile, the Video Agent aids in providing multimodal insights, and the Reason Agent uses the updated profile to predict whether a user likes the recommended video and provides explainable reasons.
  • Figure 3: Training process of our agents. In the first stage, we use real user feedback reasons for cold start; and in the second stage, we propose a progressive reward mechanism that provides step rewards for a response in an order from easy to hard.
  • Figure 4: Training Samples. The green line means the ground truth answer.
  • Figure 5: Case Study. We present specific user examples to illustrate why they choose to fast-skip the video.