Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$
Chihiro Taguchi, Seiji Maekawa, Nikita Bhutani
TL;DR
Adaptive-$k$ introduces a single-pass, plug-and-play retrieval method for long-context QA that selects the number of passages by locating the largest gap in the distribution of query-to-passage similarities, enabling per-query context sizing without tuning. It achieves substantial token reductions (up to 99% in factoid QA and 2x–10x in aggregation QA) while maintaining or improving accuracy across multiple LCLMs and embedding models. The approach is validated on HELMET and HoloBench benchmarks, showing robust gains across diverse models and demonstrating that dynamic context sizing improves both efficiency and answer quality in open-domain QA. Overall, Adaptive-$k$ offers a practical, model-agnostic alternative to fixed retrieval budgets and iterative adaptive methods, suitable for API-based deployments and large-scale QA systems.
Abstract
Retrieval-augmented generation (RAG) and long-context language models (LCLMs) both address context limitations of LLMs in open-domain question answering (QA). However, optimal external context to retrieve remains an open problem: fixing the retrieval size risks either wasting tokens or omitting key evidence. Existing adaptive methods like Self-RAG and Self-Route rely on iterative LLM prompting and perform well on factoid QA, but struggle with aggregation QA, where the optimal context size is both unknown and variable. We present Adaptive-$k$ retrieval, a simple and effective single-pass method that adaptively selects the number of passages based on the distribution of the similarity scores between the query and the candidate passages. It does not require model fine-tuning, extra LLM inferences or changes to existing retriever-reader pipelines. On both factoid and aggregation QA benchmarks, Adaptive-$k$ matches or outperforms fixed-$k$ baselines while using up to 10x fewer tokens than full-context input, yet still retrieves 70% of relevant passages. It improves accuracy across five LCLMs and two embedding models, highlighting that dynamically adjusting context size leads to more efficient and accurate QA.
