Table of Contents
Fetching ...

GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?

Dayoon Ko, Jinyoung Kim, Hahyeon Choi, Gunhee Kim

TL;DR

GrowOVER introduces dynamic QA and dialogue benchmarks (GrowOVER-QA and GrowOVER-Dialogue) that track the evolution of knowledge using annotated evidence text. It also presents RiLM, a training-free retrieval-interactive framework where an LLM evaluates its answers and guides re-retrieval through a certainty classifier and adaptive retrieval. Across extensive experiments, RiLM matches or rivals continuously trained LLMs, highlighting the potential of reinforcement from the model's own reliability signals to cope with knowledge changes. The work emphasizes the practical importance of dynamic benchmarks and retrieval-augmented strategies for maintaining accuracy in rapidly evolving domains.

Abstract

In the real world, knowledge is constantly evolving, which can render existing knowledge-based datasets outdated. This unreliability highlights the critical need for continuous updates to ensure both accuracy and relevance in knowledge-intensive tasks. To address this, we propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of updates, keeping pace with the rapid evolution of knowledge. Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated. Consequently, we introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval. Our exhaustive experiments demonstrate that our training-free framework significantly improves upon existing methods, performing comparably to or even surpassing continuously trained language models.

GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?

TL;DR

GrowOVER introduces dynamic QA and dialogue benchmarks (GrowOVER-QA and GrowOVER-Dialogue) that track the evolution of knowledge using annotated evidence text. It also presents RiLM, a training-free retrieval-interactive framework where an LLM evaluates its answers and guides re-retrieval through a certainty classifier and adaptive retrieval. Across extensive experiments, RiLM matches or rivals continuously trained LLMs, highlighting the potential of reinforcement from the model's own reliability signals to cope with knowledge changes. The work emphasizes the practical importance of dynamic benchmarks and retrieval-augmented strategies for maintaining accuracy in rapidly evolving domains.

Abstract

In the real world, knowledge is constantly evolving, which can render existing knowledge-based datasets outdated. This unreliability highlights the critical need for continuous updates to ensure both accuracy and relevance in knowledge-intensive tasks. To address this, we propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of updates, keeping pace with the rapid evolution of knowledge. Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated. Consequently, we introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval. Our exhaustive experiments demonstrate that our training-free framework significantly improves upon existing methods, performing comparably to or even surpassing continuously trained language models.
Paper Structure (30 sections, 3 equations, 4 figures, 25 tables, 5 algorithms)

This paper contains 30 sections, 3 equations, 4 figures, 25 tables, 5 algorithms.

Figures (4)

  • Figure 1: An illustration of GrowOVER benchmarks. GrowOVER is automatically generated and continuously updated. It provides the evidence text to evaluate the retriever and also comprehensively evaluates the generator through an open-domain dialogue task.
  • Figure 2: The overview of the dataset generation process. Please refer § \ref{['sec:initial_generation']}--\ref{['sec:updates']} together for detailed explanations.
  • Figure 3: The RiLM framework. Given a query, we retrieve top-$k$ documents and generate $k$ prompts to LLM in parallel. The certainty classifier predicts either reliable, misleading, or uncertain for each prompt. If reliable, the Decision Gate adopts the answer. Otherwise, we return back to the retrieval step with LLM's output and the reliable probability. In Adaptive Re-retrieval, the retriever reflects this information outputs for better retrieval, based on which the LLM re-generates answers.
  • Figure 4: Article Categories Overview