MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

Yiqun Chen; Jiaxin Mao; Yi Zhang; Dehong Ma; Long Xia; Jun Fan; Daiting Shi; Zhicong Cheng; Simiu Gu; Dawei Yin

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin

TL;DR

MA4DIV reframes search result diversification as a cooperative multi-agent reinforcement learning problem, where each document is an agent and the ranking task is optimized via a QMIX-based value decomposition to maximize $α$-NDCG@$k$. The model uses a shared Agent Network with cross-document context from Multi-Head Self-Attention to produce per-agent Q-values that contribute to a global $Q_{tot}^{*}$, enabling end-to-end training with a one-step episode. Empirical results show MA4DIV achieving state-of-the-art performance on the industrial DU-DIV dataset and competitive gains on TREC datasets, while delivering significant efficiency advantages in training and inference. The work demonstrates the viability and benefits of a general multi-agent ranking framework that can be extended beyond SRD to other ranking objectives and settings.

Abstract

Search result diversification (SRD), which aims to ensure that documents in a ranking list cover a broad range of subtopics, is a significant and widely studied problem in Information Retrieval and Web Search. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time or optimize an approximation of the objective function. These approaches tend to be inefficient and are easily trapped in a suboptimal state. To address these challenges, we introduce Multi-Agent reinforcement learning (MARL) for search result DIVersity, which called MA4DIV. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. By modeling the SRD ranking problem as a cooperative MARL problem, this approach allows for directly optimizing the diversity metrics, such as $α$-NDCG, while achieving high training efficiency. We conducted experiments on public TREC datasets and a larger scale dataset in the industrial setting. The experiemnts show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines, especially on the industrial dataset. The code of MA4DIV can be seen on https://github.com/chenyiqun/MA4DIV.

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

TL;DR

-NDCG@

. The model uses a shared Agent Network with cross-document context from Multi-Head Self-Attention to produce per-agent Q-values that contribute to a global

, enabling end-to-end training with a one-step episode. Empirical results show MA4DIV achieving state-of-the-art performance on the industrial DU-DIV dataset and competitive gains on TREC datasets, while delivering significant efficiency advantages in training and inference. The work demonstrates the viability and benefits of a general multi-agent ranking framework that can be extended beyond SRD to other ranking objectives and settings.

Abstract

-NDCG, while achieving high training efficiency. We conducted experiments on public TREC datasets and a larger scale dataset in the industrial setting. The experiemnts show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines, especially on the industrial dataset. The code of MA4DIV can be seen on https://github.com/chenyiqun/MA4DIV.

Paper Structure (39 sections, 21 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 39 sections, 21 equations, 5 figures, 7 tables, 2 algorithms.

Introduction
Related Works
Search Result Diversification
Reinforcement Learning for IR
Multi-Agent Reinforcement Learning
Background
Co-MARL
General Format of Test Set for Diversified Search
The MA4DIV Model
Essential Elements of MA4DIV
Agent Network
Ranking Process
Value-Decomposition for MA4DIV
Mixing & Hyper Network Structure
Training Process of MA4DIV
...and 24 more sections

Figures (5)

Figure 1: The Framework of MA4DIV.
Figure 2: Training curves of MA4DIV on TREC web track datasets.
Figure 3: Training curves of MA4DIV and MDP-DIV on DU-DIV dataset.
Figure 4: Average number of subtopics across the 15 documents in the DU-DIV dataset. The abscissa from 1 to 15 represents positions 1 to 15 in the ideal permutation, and the ordinate is the average of the number of subtopics contained in the documents of the corresponding position.
Figure 5: This figure illustrates the relationship between the quantity of newly sampled document lists and the count of documents that include any subtopics present in the sampled list.

Theorems & Definitions (1)

Definition 1: Permutation Invariance

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

TL;DR

Abstract

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)