Accumulating Context Changes the Beliefs of Language Models
Authors
Jiayi Geng, Howard Chen, Ryan Liu, Manoel Horta Ribeiro, Robb Willer, Graham Neubig, Thomas L. Griffiths
Abstract
Language model (LM) assistants are increasingly used in applications such as
brainstorming and research. Improvements in memory and context size have
allowed these models to become more autonomous, which has also resulted in more
text accumulation in their context windows without explicit user intervention.
This comes with a latent risk: the belief profiles of models -- their
understanding of the world as manifested in their responses or actions -- may
silently change as context accumulates. This can lead to subtly inconsistent
user experiences, or shifts in behavior that deviate from the original
alignment of the models. In this paper, we explore how accumulating context by
engaging in interactions and processing text -- talking and reading -- can
change the beliefs of language models, as manifested in their responses and
behaviors. Our results reveal that models' belief profiles are highly
malleable: GPT-5 exhibits a 54.7% shift in its stated beliefs after 10 rounds
of discussion about moral dilemmas and queries about safety, while Grok 4 shows
a 27.2% shift on political issues after reading texts from the opposing
position. We also examine models' behavioral changes by designing tasks that
require tool use, where each tool selection corresponds to an implicit belief.
We find that these changes align with stated belief shifts, suggesting that
belief shifts will be reflected in actual behavior in agentic systems. Our
analysis exposes the hidden risk of belief shift as models undergo extended
sessions of talking or reading, rendering their opinions and actions
unreliable.