MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification
Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin
TL;DR
MA4DIV reframes search result diversification as a cooperative multi-agent reinforcement learning problem, where each document is an agent and the ranking task is optimized via a QMIX-based value decomposition to maximize $α$-NDCG@$k$. The model uses a shared Agent Network with cross-document context from Multi-Head Self-Attention to produce per-agent Q-values that contribute to a global $Q_{tot}^{*}$, enabling end-to-end training with a one-step episode. Empirical results show MA4DIV achieving state-of-the-art performance on the industrial DU-DIV dataset and competitive gains on TREC datasets, while delivering significant efficiency advantages in training and inference. The work demonstrates the viability and benefits of a general multi-agent ranking framework that can be extended beyond SRD to other ranking objectives and settings.
Abstract
Search result diversification (SRD), which aims to ensure that documents in a ranking list cover a broad range of subtopics, is a significant and widely studied problem in Information Retrieval and Web Search. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time or optimize an approximation of the objective function. These approaches tend to be inefficient and are easily trapped in a suboptimal state. To address these challenges, we introduce Multi-Agent reinforcement learning (MARL) for search result DIVersity, which called MA4DIV. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. By modeling the SRD ranking problem as a cooperative MARL problem, this approach allows for directly optimizing the diversity metrics, such as $α$-NDCG, while achieving high training efficiency. We conducted experiments on public TREC datasets and a larger scale dataset in the industrial setting. The experiemnts show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines, especially on the industrial dataset. The code of MA4DIV can be seen on https://github.com/chenyiqun/MA4DIV.
