Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback
Zheng Wang, Bingzheng Gan, Wei Shi
TL;DR
The paper tackles multimodal query suggestion by generating text-based queries from user query images, focusing on intentionality and diversity. It introduces RL4Sugg, a two-agent framework where Agent-I optimizes intentionality via RewardNet and PolicyNet and Agent-D ensures diversity through a diversity-focused policy, all trained with RLHF and LLMs. Across two real-world datasets, RL4Sugg achieves an 18% improvement over strong baselines in generation and better retrieval metrics, with successful deployment in production helping to boost user engagement. The approach demonstrates a practical path for integrating multimodal cues into search engine query formulation and suggests future work extending to additional modalities like audio and video.
Abstract
In the rapidly evolving landscape of information retrieval, search engines strive to provide more personalized and relevant results to users. Query suggestion systems play a crucial role in achieving this goal by assisting users in formulating effective queries. However, existing query suggestion systems mainly rely on textual inputs, potentially limiting user search experiences for querying images. In this paper, we introduce a novel Multimodal Query Suggestion (MMQS) task, which aims to generate query suggestions based on user query images to improve the intentionality and diversity of search results. We present the RL4Sugg framework, leveraging the power of Large Language Models (LLMs) with Multi-Agent Reinforcement Learning from Human Feedback to optimize the generation process. Through comprehensive experiments, we validate the effectiveness of RL4Sugg, demonstrating a 18% improvement compared to the best existing approach. Moreover, the MMQS has been transferred into real-world search engine products, which yield enhanced user engagement. Our research advances query suggestion systems and provides a new perspective on multimodal information retrieval.
