Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti, Geethan Karunaratne, Mrinmaya Sachan, Abu Sebastian, Abbas Rahimi
TL;DR
Retro-li demonstrates that small-scale retrieval can meaningfully boost language modeling and domain generalization when paired with semantic-neighbor search and memory regularization. By using SBert-based embeddings for neighbor retrieval and a Gaussian regularizer on the non-parametric memory, the approach achieves robust performance under noisy retrieval and hardware-imposed noise, while enabling plug-and-play domain shifts without extensive fine-tuning. The work also discusses moving retrieval to analog in-memory computing hardware, which potentially offers $O(1)$ search times with minimal performance loss, and provides extensive ablations, qualitative analyses, and fine-tuning insights. Overall, Retro-li broadens the practical applicability of retrieval-augmented models to medium-sized architectures and constrained memory regimes, with clear pathways for hardware-aware training and domain-specific applications.
Abstract
The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.
