Table of Contents
Fetching ...

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

Hung-Ting Chen, Xiang Liu, Shauli Ravfogel, Eunsol Choi

TL;DR

AMER demonstrates that traditional single-vector retrievers struggle to cover multimodal target distributions in multi-answer retrieval tasks. By autoregressively generating multiple query embeddings and training with a matching loss plus Hungarian alignment, AMER captures diverse target distributions and improves retrieval on synthetic data by up to 4x and on real-world AmbigQA/QAMPARI datasets, particularly when target embeddings are farther apart. The method combines InfoNCE-based learning with scheduled sampling to simulate inference and uses a fixed document encoder to keep retrieval efficient. These results highlight the need for diverse query representations in retrieval systems and open avenues for adaptive multi-output strategies and more robust diversity benchmarks. The work suggests practical impact for retrieval-augmented generation and shows promising gains in challenging multimodal target scenarios.

Abstract

Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We first quantify the limitations of existing retrievers. All retrievers we evaluate struggle more as the distance between target document embeddings grows. To address this limitation, we develop a new retriever architecture, \emph{A}utoregressive \emph{M}ulti-\emph{E}mbedding \emph{R}etriever (AMER). Our model autoregressively generates multiple query vectors, and all the predicted query vectors are used to retrieve documents from the corpus. We show that on the synthetic vectorized data, the proposed method could capture multiple target distributions perfectly, showing 4x better performance than single embedding model. We also fine-tune our model on real-world multi-answer retrieval datasets and evaluate in-domain. AMER presents 4 and 21\% relative gains over single-embedding baselines on two datasets we evaluate on. Furthermore, we consistently observe larger gains on the subset of dataset where the embeddings of the target documents are less similar to each other. We demonstrate the potential of using a multi-query vector retriever and open up a new direction for future work.

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

TL;DR

AMER demonstrates that traditional single-vector retrievers struggle to cover multimodal target distributions in multi-answer retrieval tasks. By autoregressively generating multiple query embeddings and training with a matching loss plus Hungarian alignment, AMER captures diverse target distributions and improves retrieval on synthetic data by up to 4x and on real-world AmbigQA/QAMPARI datasets, particularly when target embeddings are farther apart. The method combines InfoNCE-based learning with scheduled sampling to simulate inference and uses a fixed document encoder to keep retrieval efficient. These results highlight the need for diverse query representations in retrieval systems and open avenues for adaptive multi-output strategies and more robust diversity benchmarks. The work suggests practical impact for retrieval-augmented generation and shows promising gains in challenging multimodal target scenarios.

Abstract

Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We first quantify the limitations of existing retrievers. All retrievers we evaluate struggle more as the distance between target document embeddings grows. To address this limitation, we develop a new retriever architecture, \emph{A}utoregressive \emph{M}ulti-\emph{E}mbedding \emph{R}etriever (AMER). Our model autoregressively generates multiple query vectors, and all the predicted query vectors are used to retrieve documents from the corpus. We show that on the synthetic vectorized data, the proposed method could capture multiple target distributions perfectly, showing 4x better performance than single embedding model. We also fine-tune our model on real-world multi-answer retrieval datasets and evaluate in-domain. AMER presents 4 and 21\% relative gains over single-embedding baselines on two datasets we evaluate on. Furthermore, we consistently observe larger gains on the subset of dataset where the embeddings of the target documents are less similar to each other. We demonstrate the potential of using a multi-query vector retriever and open up a new direction for future work.

Paper Structure

This paper contains 56 sections, 3 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Model performances per the diversity of target document set. We report performance on the whole test set (the leftmost subplots), and subsets of different number of target documents (answer set size). We partition the dataset into 4 bins ($<$25%, 25-50%, 50-75%, and $>$75%) in terms of distance between target document embeddings. As the distance becomes larger, the performance worsens. The trend holds true for all models, and is more pronounced in QAMPARI dataset, where there are more answers for each query and larger distance.
  • Figure 2: Visualization of AMER. We visualize both the training (left) and inference (right) procedure. The proposed model takes as input the target document embedding (order decided randomly) or predicted embedding in the previous step, and output the next embedding. Linear layers are added to ensure consistent dimensions. During inference, AMER predicts the first embedding after seeing the query text, and outputs multiple query embeddings autoregressively.
  • Figure 3: Results on synthetic data of Linear (Orange) and MLP (Blue) transformations. The y-axis represents performance scores, MRecall @100 and 10. We evaluate systems on different input distributions, from a Single multivariate Gaussian, to Multiple distributions as outlined and OOD distributions. Each section represents one input distribution. AMER (the right half of each section) can successfully model multiple target distributions, while the Single-Query (left) model struggles.
  • Figure 4: Performance for whole and low similarity test set for multiple base models. AMER outperforms baselines in most settings. In all models, we observe a stronger gain on the low similarity set. The gains are also larger on QAMPARI, which has a more diverse target distribution.
  • Figure 5: Vector similarity between multiple query embeddings from AMER, and the training data. "Between Targets" denotes the pairwise distance between target embeddings in the training dataset. Larger models exhibit overall higher diversity.
  • ...and 1 more figures