Memorization vs. Reasoning: Updating LLMs with New Knowledge

Aochong Oliver Li; Tanya Goyal

Memorization vs. Reasoning: Updating LLMs with New Knowledge

Aochong Oliver Li, Tanya Goyal

TL;DR

This paper introduces Knowledge Update Playground (KUP), a framework to automatically generate realistic knowledge updates and an evaluation suite (KUPEval) to test LLM memorization and reasoning over updated facts. It also proposes Memory Conditioned Training (MCT), a lightweight CPT method that prefixes memory tokens generated from the base model to training data to encourage surface and reasoning over updates. Empirical results on two open-source LLMs show KUP is challenging for standard CPT approaches, with indirect reasoning remaining particularly hard, while MCT significantly improves direct memorization and benefits from chain-of-thought strategies. The work highlights that current CPT methods largely memorize updates and struggle to reason about their implications, suggesting a need for future methods that better integrate updated knowledge into inferential processes. The authors provide open-source code and data to facilitate further research in continual knowledge updating for LLMs.

Abstract

Large language models (LLMs) encode vast amounts of pre-trained knowledge in their parameters, but updating them as real-world information evolves remains a challenge. Existing methodologies and benchmarks primarily target entity substitutions, failing to capture the full breadth of complex real-world dynamics. In this paper, we introduce Knowledge Update Playground (KUP), an automatic pipeline for simulating realistic knowledge updates reflected in an evidence corpora. KUP's evaluation framework includes direct and indirect probes to both test memorization of updated facts and reasoning over them, for any update learning methods. Next, we present a lightweight method called memory conditioned training (MCT), which conditions tokens in the update corpus on self-generated "memory" tokens during training. Our strategy encourages LLMs to surface and reason over newly memorized knowledge at inference. Our results on two strong LLMs show that (1) KUP benchmark is highly challenging, with the best CPT models achieving $<2\%$ in indirect probing setting (reasoning) and (2) MCT training significantly outperforms prior continued pre-training (CPT) baselines, improving direct probing (memorization) results by up to $25.4\%$.

Memorization vs. Reasoning: Updating LLMs with New Knowledge

TL;DR

Abstract

Memorization vs. Reasoning: Updating LLMs with New Knowledge

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)