FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani
TL;DR
<3-5 sentence high-level summary> FiD-Light addresses the high decoder cost of retrieval-augmented generation by compressing per-passage encoder outputs to a small number of vectors, reducing the autoregressive decoding workload while preserving effectiveness. It further strengthens provenance retrieval through a robust source-pointer re-ranking mechanism (FiD-Light-SP) that top-ranks model-identified passages without discarding the rest. Across seven KILT tasks, FiD-Light-SP improves the latency–effectiveness Pareto frontier and sets new state-of-the-art results on six tasks for combined generation and provenance retrieval, demonstrating practical efficiency gains. The approach remains compatible with larger backbones to offset any minor losses in generation quality, enabling scalable, efficient retrieval-augmented generation in real-world settings.
Abstract
Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base. However, they are also more complex systems and need to handle long inputs. In this work, we introduce FiD-Light to strongly increase the efficiency of the state-of-the-art retrieval-augmented FiD model, while maintaining the same level of effectiveness. Our FiD-Light model constrains the information flow from the encoder (which encodes passages separately) to the decoder (using concatenated encoded representations). Furthermore, we adapt FiD-Light with re-ranking capabilities through textual source pointers, to improve the top-ranked provenance precision. Our experiments on a diverse set of seven knowledge intensive tasks (KILT) show FiD-Light consistently improves the Pareto frontier between query latency and effectiveness. FiD-Light with source pointing sets substantial new state-of-the-art results on six KILT tasks for combined text generation and provenance retrieval evaluation, while maintaining reasonable efficiency.
