Table of Contents
Fetching ...

Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

Gentiana Rashiti, Geethan Karunaratne, Mrinmaya Sachan, Abu Sebastian, Abbas Rahimi

TL;DR

Retro-li demonstrates that small-scale retrieval can meaningfully boost language modeling and domain generalization when paired with semantic-neighbor search and memory regularization. By using SBert-based embeddings for neighbor retrieval and a Gaussian regularizer on the non-parametric memory, the approach achieves robust performance under noisy retrieval and hardware-imposed noise, while enabling plug-and-play domain shifts without extensive fine-tuning. The work also discusses moving retrieval to analog in-memory computing hardware, which potentially offers $O(1)$ search times with minimal performance loss, and provides extensive ablations, qualitative analyses, and fine-tuning insights. Overall, Retro-li broadens the practical applicability of retrieval-augmented models to medium-sized architectures and constrained memory regimes, with clear pathways for hardware-aware training and domain-specific applications.

Abstract

The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.

Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

TL;DR

Retro-li demonstrates that small-scale retrieval can meaningfully boost language modeling and domain generalization when paired with semantic-neighbor search and memory regularization. By using SBert-based embeddings for neighbor retrieval and a Gaussian regularizer on the non-parametric memory, the approach achieves robust performance under noisy retrieval and hardware-imposed noise, while enabling plug-and-play domain shifts without extensive fine-tuning. The work also discusses moving retrieval to analog in-memory computing hardware, which potentially offers search times with minimal performance loss, and provides extensive ablations, qualitative analyses, and fine-tuning insights. Overall, Retro-li broadens the practical applicability of retrieval-augmented models to medium-sized architectures and constrained memory regimes, with clear pathways for hardware-aware training and domain-specific applications.

Abstract

The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.
Paper Structure (56 sections, 2 equations, 15 figures, 22 tables, 2 algorithms)

This paper contains 56 sections, 2 equations, 15 figures, 22 tables, 2 algorithms.

Figures (15)

  • Figure 1: (a) Classic encoder-decoder (b) RAG encoder-decoder.
  • Figure 2: Retro-li architecture diagram. The file index (IVF) is created in one pass. For each chunk C$_i$ (here 4 tokens) we get one neighbor. To generate the token for chunk $C_2$, we utilize the neighbor of chunk $C_1$. For Retro-li layers 1-5,7,8,10,11 there is no chunked cross-attention (CCA) block. The information flows directly from the GPT-2 attention block to the FFW layer.
  • Figure 3: Retrieval in action. (a) Extract of the chunk used to query the index (b) Closest neighbor in the retrieval database (c) Second closest neighbor in the retrieval database.
  • Figure 4: One-gram Jaccard similarity of sequences and their two nearest neighbors (a) For WikiText-103-Train sequences (b) For WikiText-103-Validation sequences.
  • Figure 5: One-gram Jaccard similarity of sequences and their ten nearest neighbors (a) For WikiText-103-Train sequences (b) For WikiText-103-Validation sequences.
  • ...and 10 more figures