Table of Contents
Fetching ...

When & How to Write for Personalized Demand-aware Query Rewriting in Video Search

Cheng cheng, Chenxing Wang, Aolin Li, Haijun Wu, Huiyun Hu, Juyuan Wang

TL;DR

This work tackles ambiguity in video search queries by introducing WeWrite, a Personalized Demand-aware Query Rewriting framework. It jointly addresses when to rewrite via posterior-based sample mining and how to rewrite through a hybrid SFT+GRPO training paradigm, with a latency-aware Fake Recall deployment. The approach yields improvements in retrieval-relevant metrics and reduces reformulation incidence in online trials. The combination of explicit personalization, retrieval-oriented optimization, and parallel deployment demonstrates practical impact for large-scale video search systems.

Abstract

In video search systems, user historical behaviors provide rich context for identifying search intent and resolving ambiguity. However, traditional methods utilizing implicit history features often suffer from signal dilution and delayed feedback. To address these challenges, we propose WeWrite, a novel Personalized Demand-aware Query Rewriting framework. Specifically, WeWrite tackles three key challenges: (1) When to Write: An automated posterior-based mining strategy extracts high-quality samples from user logs, identifying scenarios where personalization is strictly necessary; (2) How to Write: A hybrid training paradigm combines Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to align the LLM's output style with the retrieval system; (3) Deployment: A parallel "Fake Recall" architecture ensures low latency. Online A/B testing on a large-scale video platform demonstrates that WeWrite improves the Click-Through Video Volume (VV$>$10s) by 1.07% and reduces the Query Reformulation Rate by 2.97%.

When & How to Write for Personalized Demand-aware Query Rewriting in Video Search

TL;DR

This work tackles ambiguity in video search queries by introducing WeWrite, a Personalized Demand-aware Query Rewriting framework. It jointly addresses when to rewrite via posterior-based sample mining and how to rewrite through a hybrid SFT+GRPO training paradigm, with a latency-aware Fake Recall deployment. The approach yields improvements in retrieval-relevant metrics and reduces reformulation incidence in online trials. The combination of explicit personalization, retrieval-oriented optimization, and parallel deployment demonstrates practical impact for large-scale video search systems.

Abstract

In video search systems, user historical behaviors provide rich context for identifying search intent and resolving ambiguity. However, traditional methods utilizing implicit history features often suffer from signal dilution and delayed feedback. To address these challenges, we propose WeWrite, a novel Personalized Demand-aware Query Rewriting framework. Specifically, WeWrite tackles three key challenges: (1) When to Write: An automated posterior-based mining strategy extracts high-quality samples from user logs, identifying scenarios where personalization is strictly necessary; (2) How to Write: A hybrid training paradigm combines Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to align the LLM's output style with the retrieval system; (3) Deployment: A parallel "Fake Recall" architecture ensures low latency. Online A/B testing on a large-scale video platform demonstrates that WeWrite improves the Click-Through Video Volume (VV10s) by 1.07% and reduces the Query Reformulation Rate by 2.97%.
Paper Structure (16 sections, 6 equations, 3 figures)

This paper contains 16 sections, 6 equations, 3 figures.

Figures (3)

  • Figure 1: Positive Case: WeWrite resolves ambiguity (Singer vs. Liquor) using user history.
  • Figure 2: Negative Case: Indiscriminate rewriting causes intent drift. Functional queries ("Air fryer") should not be rewritten based on entertainment history.
  • Figure 3: Overview of the proposed framework. It comprises offline mining of intent-aligned samples, hybrid LLM training (SFT+RL) for style alignment, and an online parallel "Fake Recall" architecture to minimize latency.