Table of Contents
Fetching ...

Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

Adi Simhi, Fazl Barez, Martin Tutek, Yonatan Belinkov, Shay B. Cohen

TL;DR

This work introduces History-Echoes, a framework that investigates how conversational history biases subsequent generations, and demonstrates that behavioral persistence manifests as a geometric trap, where gaps in the latent space confine the model's trajectory.

Abstract

How does the conversational past of large language models (LLMs) influence their future performance? Recent work suggests that LLMs are affected by their conversational history in unexpected ways. For instance, hallucinations in prior interactions may influence subsequent model responses. In this work, we introduce History-Echoes, a framework that investigates how conversational history biases subsequent generations. The framework explores this bias from two perspectives: probabilistically, we model conversations as Markov chains to quantify state consistency; geometrically, we measure the consistency of consecutive hidden representations. Across three model families and six datasets spanning diverse phenomena, our analysis reveals a strong correlation between the two perspectives. By bridging these perspectives, we demonstrate that behavioral persistence manifests as a geometric trap, where gaps in the latent space confine the model's trajectory. Code available at https://github.com/technion-cs-nlp/OldHabitsDieHard.

Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

TL;DR

This work introduces History-Echoes, a framework that investigates how conversational history biases subsequent generations, and demonstrates that behavioral persistence manifests as a geometric trap, where gaps in the latent space confine the model's trajectory.

Abstract

How does the conversational past of large language models (LLMs) influence their future performance? Recent work suggests that LLMs are affected by their conversational history in unexpected ways. For instance, hallucinations in prior interactions may influence subsequent model responses. In this work, we introduce History-Echoes, a framework that investigates how conversational history biases subsequent generations. The framework explores this bias from two perspectives: probabilistically, we model conversations as Markov chains to quantify state consistency; geometrically, we measure the consistency of consecutive hidden representations. Across three model families and six datasets spanning diverse phenomena, our analysis reveals a strong correlation between the two perspectives. By bridging these perspectives, we demonstrate that behavioral persistence manifests as a geometric trap, where gaps in the latent space confine the model's trajectory. Code available at https://github.com/technion-cs-nlp/OldHabitsDieHard.
Paper Structure (44 sections, 10 equations, 12 figures, 10 tables)

This paper contains 44 sections, 10 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: The geometric trap of past context. We correlate external behavior with latent geometry to evaluate the extent of carryover effects. The probabilistic perspective (§\ref{['sec:Probabilistic']}) measures the model's state consistency, while the geometric perspective (§\ref{['sec: geometric']}) measures the orthogonality and dynamics of latent phenomenon state representations. We find that probabilistic consistency correlates with the geometric trap, where the phenomenon states are separated by a large angle. Samples are illustrative and do not pertain to real data.
  • Figure 2: $\text{Tr} (\mathbf{T}){}$ and $\theta_{\text{ref}}$ are strongly correlated (Spearman $0.78$, $p<0.0002$) across all models and datasets.
  • Figure 3: The relation between $\text{Tr} (\mathbf{T}){}$ and $\theta_{\text{ref}}$ for all models and datasets using $D_{\text{inconsistent}}$. While $\theta_{\text{ref}}$ values remain similar to those in $D_{\text{consistent}}$ (\ref{['fig:correlation_theta_tr']}), their relationship with the trace changes: as $\theta_{\text{ref}}$ increases, there is only a marginal increase in $\text{Tr} (\mathbf{T}){}$.
  • Figure 4: The effect of Markov order on $\Delta_k$ averaged across models. The first step exhibits the strongest effect, which diminishes but is non-negligible for two and three steps in the past.
  • Figure 5: Spearman correlation between the geometric and probabilistic perspectives across layers. All layers exhibit strong correlation, with the highest correlation observed in the upper layers.
  • ...and 7 more figures