Table of Contents
Fetching ...

Mitigating Temporal Misalignment by Discarding Outdated Facts

Michael J. Q. Zhang, Eunsol Choi

TL;DR

It is demonstrated that identifying which facts are prone to rapid change can help models avoid reciting outdated information and determine which predictions require seeking out up-to-date knowledge sources, and how modeling fact duration improves calibration for knowledge-intensive tasks, such as open-retrieval question answering.

Abstract

While large language models are able to retain vast amounts of world knowledge seen during pretraining, such knowledge is prone to going out of date and is nontrivial to update. Furthermore, these models are often used under temporal misalignment, tasked with answering questions about the present, despite having only been trained on data collected in the past. To mitigate the effects of temporal misalignment, we propose fact duration prediction: the task of predicting how long a given fact will remain true. In our experiments, we demonstrate that identifying which facts are prone to rapid change can help models avoid reciting outdated information and determine which predictions require seeking out up-to-date knowledge sources. We also show how modeling fact duration improves calibration for knowledge-intensive tasks, such as open-retrieval question answering, under temporal misalignment, by discarding volatile facts. Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.

Mitigating Temporal Misalignment by Discarding Outdated Facts

TL;DR

It is demonstrated that identifying which facts are prone to rapid change can help models avoid reciting outdated information and determine which predictions require seeking out up-to-date knowledge sources, and how modeling fact duration improves calibration for knowledge-intensive tasks, such as open-retrieval question answering.

Abstract

While large language models are able to retain vast amounts of world knowledge seen during pretraining, such knowledge is prone to going out of date and is nontrivial to update. Furthermore, these models are often used under temporal misalignment, tasked with answering questions about the present, despite having only been trained on data collected in the past. To mitigate the effects of temporal misalignment, we propose fact duration prediction: the task of predicting how long a given fact will remain true. In our experiments, we demonstrate that identifying which facts are prone to rapid change can help models avoid reciting outdated information and determine which predictions require seeking out up-to-date knowledge sources. We also show how modeling fact duration improves calibration for knowledge-intensive tasks, such as open-retrieval question answering, under temporal misalignment, by discarding volatile facts. Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
Paper Structure (53 sections, 3 figures, 10 tables)

This paper contains 53 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: We depict the critical timestamps at play in open-retrieval QA systems. In the example on the left, the temporal misalignment between when the system was trained and evaluated has no affect on the answer. On the right, the answer has changed, causing the system to output an outdated answer with high confidence. To account for this, we apply our fact duration prediction system to adjust the system's confidence accordingly.
  • Figure 2: Duration statistics on each dataset's development set. Columns represent different duration classes used by our classification model, with units abbreviated as Seconds, Minutes, Days, Weeks, Months, Years, Decades, and Centuries. Cells contain the % of examples in each dataset in the column's duration class.
  • Figure 3: Fact Duration Prediction Results. On the left, we report our full results, with performance split by model type and training data. Performance on SituatedQA and TimeQA are given as the mean average error in years (Y) and mean squared error in years in log-seconds (LS), the same as the regression system training loss. On the right, we depict error histograms evaluated on SituatedQA, with systems trained on TimeQA.