FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

Sebastian Hofstätter; Jiecao Chen; Karthik Raman; Hamed Zamani

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

TL;DR

<3-5 sentence high-level summary> FiD-Light addresses the high decoder cost of retrieval-augmented generation by compressing per-passage encoder outputs to a small number of vectors, reducing the autoregressive decoding workload while preserving effectiveness. It further strengthens provenance retrieval through a robust source-pointer re-ranking mechanism (FiD-Light-SP) that top-ranks model-identified passages without discarding the rest. Across seven KILT tasks, FiD-Light-SP improves the latency–effectiveness Pareto frontier and sets new state-of-the-art results on six tasks for combined generation and provenance retrieval, demonstrating practical efficiency gains. The approach remains compatible with larger backbones to offset any minor losses in generation quality, enabling scalable, efficient retrieval-augmented generation in real-world settings.

Abstract

Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base. However, they are also more complex systems and need to handle long inputs. In this work, we introduce FiD-Light to strongly increase the efficiency of the state-of-the-art retrieval-augmented FiD model, while maintaining the same level of effectiveness. Our FiD-Light model constrains the information flow from the encoder (which encodes passages separately) to the decoder (using concatenated encoded representations). Furthermore, we adapt FiD-Light with re-ranking capabilities through textual source pointers, to improve the top-ranked provenance precision. Our experiments on a diverse set of seven knowledge intensive tasks (KILT) show FiD-Light consistently improves the Pareto frontier between query latency and effectiveness. FiD-Light with source pointing sets substantial new state-of-the-art results on six KILT tasks for combined text generation and provenance retrieval evaluation, while maintaining reasonable efficiency.

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

TL;DR

Abstract

Paper Structure (23 sections, 7 equations, 6 figures, 6 tables)

This paper contains 23 sections, 7 equations, 6 figures, 6 tables.

Introduction
Background and Related Work
FiD (Fusion in Decoder) with Explanations
Related Work
Efficient Generation Models.
Retrieval-Enhanced Machine Learning.
Improving and Adapting the FiD Model.
FiD-Light with Source Pointers
Decoder Efficiency.
Source Pointing Robustness
Results
Influence of the Retriever
Source Pointer Robustness
Efficiency - Effectiveness Tradeoff
Comparison to Related Work
...and 8 more sections

Figures (6)

Figure 1: Average inference latency for a query of FiD & FiD-Light (T5-Base on a single TPUv4).
Figure 2: Overview of the FiD-Light architecture and workflow with source pointers. We highlight our two main contributions: ➊ Compressing the encoded vectors per passage, before concatenating and feeding them through the decoder; ➋ Increasing the robustness of source pointers, by using the model as re-ranker.
Figure 3: Distributions of source pointer passages for FiD-Light$^\text{SP}$ (T5-Base).
Figure 4: Comparing the capability to select the two relevant passages in HotpotQA for FiD-Light$^\text{SP}$ and FiD$^\text{SP}$.
Figure 5: Comparing FiD-Light$^\text{SP}$ with FiD$^\text{SP}$ on KILT-scores modulating the number of input passages on FiD and the number of decoder-input vectors on FiD-Light.
...and 1 more figures

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

TL;DR

Abstract

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)