Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts
Zhuo Chen, Xinyu Wang, Yong Jiang, Pengjun Xie, Fei Huang, Kewei Tu
TL;DR
This work tackles open-domain QA under long-context constraints by introducing a lightweight encoder that vectorizes additional retrieved contexts and interacts with a large language model via cross-attention. The approach extends the effective context length from baseline 2k tokens to up to 5k–10k tokens in dense form while keeping compute near the baseline. Empirical results across held-in, held-out, and ICL settings show consistent improvements over a strong 2k-baseline, with the frozen-encoder strategy offering the most stable gains. The method provides a simple, general pathway to leverage longer contexts in RAG-based ODQA without requiring large increases in computational resources, though it remains to be tested on larger LMs and in broader ICL scenarios.
Abstract
In the era of large language models, applying techniques such as Retrieval Augmented Generation can better address Open-Domain Question-Answering problems. Due to constraints including model sizes and computing resources, the length of context is often limited, and it becomes challenging to empower the model to cover overlong contexts while answering questions from open domains. This paper proposes a general and convenient method to covering longer contexts in Open-Domain Question-Answering tasks. It leverages a small encoder language model that effectively encodes contexts, and the encoding applies cross-attention with origin inputs. With our method, the origin language models can cover several times longer contexts while keeping the computing requirements close to the baseline. Our experiments demonstrate that after fine-tuning, there is improved performance across two held-in datasets, four held-out datasets, and also in two In Context Learning settings.
