Table of Contents
Fetching ...

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

Qiheng Lu, Nicholas D. Sidiropoulos

Abstract

Diversity-aware retrieval is essential for Retrieval-Augmented Generation (RAG), yet existing methods lack theoretical guarantees and face scalability issues as the number of retrieved passages $k$ increases. We propose a principled formulation of diversity retrieval as a cardinality-constrained binary quadratic programming (CCBQP), which explicitly balances relevance and semantic diversity through an interpretable trade-off parameter. Inspired by recent advances in combinatorial optimization, we develop a non-convex tight continuous relaxation and a Frank--Wolfe based algorithm with landscape analysis and convergence guarantees. Extensive experiments demonstrate that our method consistently dominates baselines on the relevance-diversity Pareto frontier, while achieving significant speedup.

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

Abstract

Diversity-aware retrieval is essential for Retrieval-Augmented Generation (RAG), yet existing methods lack theoretical guarantees and face scalability issues as the number of retrieved passages increases. We propose a principled formulation of diversity retrieval as a cardinality-constrained binary quadratic programming (CCBQP), which explicitly balances relevance and semantic diversity through an interpretable trade-off parameter. Inspired by recent advances in combinatorial optimization, we develop a non-convex tight continuous relaxation and a Frank--Wolfe based algorithm with landscape analysis and convergence guarantees. Extensive experiments demonstrate that our method consistently dominates baselines on the relevance-diversity Pareto frontier, while achieving significant speedup.

Paper Structure

This paper contains 14 sections, 5 theorems, 26 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Theorem 2.1

If $\lambda \geq 2$, the relaxation from p2 to p3 is tight for any $k$. $\blacktriangleleft$$\blacktriangleleft$

Figures (1)

  • Figure 1: Relevance-diversity trade-off and efficiency. Relevance-diversity trade-off and computational efficiency on ASQA and QAMPARI across $k \in \{25, 50, 100\}$. The left panels show the Pareto frontier of Recall vs. ILAD by modulating the trade-off parameter $\theta \in [0.1, 0.9]$. The right panels report the per-query wall-clock latency (ms). Our algorithm consistently yields superior subset quality and significant speedup.

Theorems & Definitions (13)

  • Theorem 2.1
  • proof
  • Corollary 2.3
  • Theorem 2.4: Strict Dichotomy of Stationary Points
  • proof
  • Theorem 2.5: Monotonicity of Local Maximizers
  • proof
  • Theorem 3.1: Local Exact Convergence
  • proof
  • proof
  • ...and 3 more