Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

Zixuan Zhang; Revanth Gangi Reddy; Kevin Small; Tong Zhang; Heng Ji

Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

Zixuan Zhang, Revanth Gangi Reddy, Kevin Small, Tong Zhang, Heng Ji

TL;DR

This work tackles the challenge of Open-Domain Question Answering generalization under dynamic knowledge and domain shifts. It identifies reader over-memorization of retrieved documents as a key bottleneck and introduces Corpus-Invariant Tuning (CIT), a regularization loss that constrains the reader’s likelihood of retrieved contexts during training. By combining L_QA with an auxiliary L_CIT term, and using Masked Span Prediction probabilities, CIT trains models to rely more on retrieved evidence rather than memorized corpus content. Extensive experiments on NQ, TriviaQA, and RobustQA demonstrate that CIT substantially improves cross-version and cross-domain generalization while preserving or enhancing in-domain performance, and it also boosts retrieval performance and evidence coverage. The approach offers a practical, parameterizable way to improve generalization in retrieval-augmented OpenQA systems, with applicability to encoder-decoder architectures and even decoder-only prompts.

Abstract

Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain accurate. In addition, it is still unclear how well an OpenQA model can transfer to completely new knowledge domains. In this paper, we investigate the generalization performance of a retrieval-augmented QA model in two specific scenarios: 1) adapting to updated versions of the same knowledge corpus; 2) switching to completely different knowledge domains. We observe that the generalization challenges of OpenQA models stem from the reader's over-reliance on memorizing the knowledge from the external corpus, which hinders the model from generalizing to a new knowledge corpus. We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization by controlling the likelihood of retrieved contexts during training. Extensive experimental results on multiple OpenQA benchmarks show that CIT achieves significantly better generalizability without compromising the model's performance in its original corpus and domain.

Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

TL;DR

Abstract

Paper Structure (33 sections, 7 equations, 3 figures, 10 tables)

This paper contains 33 sections, 7 equations, 3 figures, 10 tables.

Introduction
Preliminaries
Problem Formulation
Retrieval Augmentation
Evaluation of Model Generalization
Evaluations of RQ1
Evaluations of RQ2
Corpus-Invariant Tuning
Validation
Corpus-Invariant Tuning (CIT)
Discussion
Experiments
Data
NQ and TriviaQA
RobustQA
...and 18 more sections

Figures (3)

Figure 1: Our proposed Corpus-Invariant Tuning (CIT) Framework. In addition to the existing loss for question answering, we introduce an auxiliary CIT loss to make sure that the reader does not over memorize the retrieved contexts. Specifically, given each batch of QA pairs and the relevant documents retrieved from the corpus, the CIT loss makes sure that the reader's likelihood of these documents does not increase.
Figure 2: The result heatmaps for cross-domain generalization experiments. Each value in the heatmap represents the absolute improvement (compared with Atlas-XL) of cross-domain relative performance (CRP) defined in Equation \ref{['eqn:crp']}. Darker green indicates larger improvements in cross-domain generalization.
Figure 3: Parameter sensitivity on choices of $\alpha$.

Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

TL;DR

Abstract

Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization

Authors

TL;DR

Abstract

Table of Contents

Figures (3)