Table of Contents
Fetching ...

Rich-Media Re-Ranker: A User Satisfaction-Driven LLM Re-ranking Framework for Rich-Media Search

Zihao Guo, Ligang Zhou, Zeyang Tang, Feicheng Li, Ying Nie, Zhiming Peng, Qingyun Sun, Jianxin Li

TL;DR

The paper addresses the limitations of existing re-ranking by modeling multifaceted user intents and leveraging rich, multimodal signals in rich-media search. It introduces Rich-Media Re-Ranker, combining a session-aware Query Planner, a VLM-based Cover Image Value Assessment, and an LLM-based re-ranker trained with multi-task reinforcement learning (GRPO) to optimize a multifaceted ranking objective that includes relevance, quality, information gain, novelty, and visual cues, all in a listwise framework. Extensive offline evaluations and ablations demonstrate substantial improvements over strong baselines, and online deployment in a large-scale system shows gains across engagement and satisfaction metrics. The work advances practical, interpretable re-ranking for multimodal search and provides a deployable, data-driven approach to align results with diverse user intents and visual preferences.

Abstract

Re-ranking plays a crucial role in modern information search systems by refining the ranking of initial search results to better satisfy user information needs. However, existing methods show two notable limitations in improving user search satisfaction: inadequate modeling of multifaceted user intents and neglect of rich side information such as visual perception signals. To address these challenges, we propose the Rich-Media Re-Ranker framework, which aims to enhance user search satisfaction through multi-dimensional and fine-grained modeling. Our approach begins with a Query Planner that analyzes the sequence of query refinements within a session to capture genuine search intents, decomposing the query into clear and complementary sub-queries to enable broader coverage of users' potential intents. Subsequently, moving beyond primary text content, we integrate richer side information of candidate results, including signals modeling visual content generated by the VLM-based evaluator. These comprehensive signals are then processed alongside carefully designed re-ranking principle that considers multiple facets, including content relevance and quality, information gain, information novelty, and the visual presentation of cover images. Then, the LLM-based re-ranker performs the holistic evaluation based on these principles and integrated signals. To enhance the scenario adaptability of the VLM-based evaluator and the LLM-based re-ranker, we further enhance their capabilities through multi-task reinforcement learning. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines. Notably, the proposed framework has been deployed in a large-scale industrial search system, yielding substantial improvements in online user engagement rates and satisfaction metrics.

Rich-Media Re-Ranker: A User Satisfaction-Driven LLM Re-ranking Framework for Rich-Media Search

TL;DR

The paper addresses the limitations of existing re-ranking by modeling multifaceted user intents and leveraging rich, multimodal signals in rich-media search. It introduces Rich-Media Re-Ranker, combining a session-aware Query Planner, a VLM-based Cover Image Value Assessment, and an LLM-based re-ranker trained with multi-task reinforcement learning (GRPO) to optimize a multifaceted ranking objective that includes relevance, quality, information gain, novelty, and visual cues, all in a listwise framework. Extensive offline evaluations and ablations demonstrate substantial improvements over strong baselines, and online deployment in a large-scale system shows gains across engagement and satisfaction metrics. The work advances practical, interpretable re-ranking for multimodal search and provides a deployable, data-driven approach to align results with diverse user intents and visual preferences.

Abstract

Re-ranking plays a crucial role in modern information search systems by refining the ranking of initial search results to better satisfy user information needs. However, existing methods show two notable limitations in improving user search satisfaction: inadequate modeling of multifaceted user intents and neglect of rich side information such as visual perception signals. To address these challenges, we propose the Rich-Media Re-Ranker framework, which aims to enhance user search satisfaction through multi-dimensional and fine-grained modeling. Our approach begins with a Query Planner that analyzes the sequence of query refinements within a session to capture genuine search intents, decomposing the query into clear and complementary sub-queries to enable broader coverage of users' potential intents. Subsequently, moving beyond primary text content, we integrate richer side information of candidate results, including signals modeling visual content generated by the VLM-based evaluator. These comprehensive signals are then processed alongside carefully designed re-ranking principle that considers multiple facets, including content relevance and quality, information gain, information novelty, and the visual presentation of cover images. Then, the LLM-based re-ranker performs the holistic evaluation based on these principles and integrated signals. To enhance the scenario adaptability of the VLM-based evaluator and the LLM-based re-ranker, we further enhance their capabilities through multi-task reinforcement learning. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines. Notably, the proposed framework has been deployed in a large-scale industrial search system, yielding substantial improvements in online user engagement rates and satisfaction metrics.
Paper Structure (38 sections, 5 equations, 9 figures, 5 tables)

This paper contains 38 sections, 5 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Comparison between original ranking and desired re-ranking results in rich-media search system.
  • Figure 2: The workflow of Query Planner.
  • Figure 3: Example of cover image value.
  • Figure 4: Framework of our Rich-Media Re-Ranker. (a) We first leverage a Session-aware Query Planner to capture the user's multi-dimensional intent. The top-k results from decomposed sub-queries are combined as the candidate set for re-ranking, which improves intent coverage. (b) The VLM Evaluator assesses the relevance and quality of document cover images, integrates this with side information, providing signals for subsequent re-ranking. (c) All documents are organized in a structured manner according to the designed re-ranking principle. Considering multiple dimensions, the LLM Re-ranker produces the document rank that aligns with user satisfaction. (d) We enhance the capabilities of the VLM Evaluator and LLM Re-ranker in real-world data and tasks through multi-task reinforcement learning.
  • Figure 5: Case study on online rich-media search system. Left: Online System, Right: Our Rich Media Re-ranker.
  • ...and 4 more figures