Table of Contents
Fetching ...

Generative Retrieval with Few-shot Indexing

Arian Askari, Chuan Meng, Mohammad Aliannejadi, Zhaochun Ren, Evangelos Kanoulas, Suzan Verberne

TL;DR

The paper tackles the high cost and rigidity of training-based indexing in generative retrieval by introducing Few-Shot GR, a training-free approach that uses LLM prompting to create a docid bank via few-shot indexing. It further enhances retrieval by adopting a one-to-many mapping, generating multiple docids per document, and constraining the LLM during retrieval to select docids from the bank. Experimental results on Natural Questions and MS MARCO demonstrate competitive or superior performance relative to trained GR methods, while dramatically reducing indexing effort. The work highlights the importance of both the number of docids per document and the choice of LLM, and discusses potential extensions to larger datasets and dynamic corpora.

Abstract

Existing generative retrieval (GR) methods rely on training-based indexing, which fine-tunes a model to memorise associations between queries and the document identifiers (docids) of relevant documents. Training-based indexing suffers from high training costs, under-utilisation of pre-trained knowledge in large language models (LLMs), and limited adaptability to dynamic document corpora. To address the issues, we propose a few-shot indexing-based GR framework (Few-Shot GR). It has a few-shot indexing process without any training, where we prompt an LLM to generate docids for all documents in a corpus, ultimately creating a docid bank for the entire corpus. During retrieval, we feed a query to the same LLM and constrain it to generate a docid within the docid bank created during indexing, and then map the generated docid back to its corresponding document. Moreover, we devise few-shot indexing with one-to-many mapping to further enhance Few-Shot GR. Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods requiring heavy training.

Generative Retrieval with Few-shot Indexing

TL;DR

The paper tackles the high cost and rigidity of training-based indexing in generative retrieval by introducing Few-Shot GR, a training-free approach that uses LLM prompting to create a docid bank via few-shot indexing. It further enhances retrieval by adopting a one-to-many mapping, generating multiple docids per document, and constraining the LLM during retrieval to select docids from the bank. Experimental results on Natural Questions and MS MARCO demonstrate competitive or superior performance relative to trained GR methods, while dramatically reducing indexing effort. The work highlights the importance of both the number of docids per document and the choice of LLM, and discusses potential extensions to larger datasets and dynamic corpora.

Abstract

Existing generative retrieval (GR) methods rely on training-based indexing, which fine-tunes a model to memorise associations between queries and the document identifiers (docids) of relevant documents. Training-based indexing suffers from high training costs, under-utilisation of pre-trained knowledge in large language models (LLMs), and limited adaptability to dynamic document corpora. To address the issues, we propose a few-shot indexing-based GR framework (Few-Shot GR). It has a few-shot indexing process without any training, where we prompt an LLM to generate docids for all documents in a corpus, ultimately creating a docid bank for the entire corpus. During retrieval, we feed a query to the same LLM and constrain it to generate a docid within the docid bank created during indexing, and then map the generated docid back to its corresponding document. Moreover, we devise few-shot indexing with one-to-many mapping to further enhance Few-Shot GR. Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods requiring heavy training.
Paper Structure (5 sections, 2 equations, 2 figures, 3 tables)

This paper contains 5 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Prompt used for indexing and retrieval. The three queries in the demonstration examples are sampled from NQ's training set kwiatkowski2019natural, while their corresponding docid are annotated by the authors.
  • Figure 2: Few-Shot GR's retrieval quality w.r.t. # generated docid per document in few-shot indexing on NQ320K.