Table of Contents
Fetching ...

Do LLMs Benefit From Their Own Words?

Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas

TL;DR

This work compares in-the-wild, multi-turn prompting with a user-turn-only prompting approach that omits all previous assistant responses, and suggests that selectively omitting assistant history can improve response quality while reducing memory consumption.

Abstract

Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

Do LLMs Benefit From Their Own Words?

TL;DR

This work compares in-the-wild, multi-turn prompting with a user-turn-only prompting approach that omits all previous assistant responses, and suggests that selectively omitting assistant history can improve response quality while reducing memory consumption.

Abstract

Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.
Paper Structure (57 sections, 9 figures, 2 tables)

This paper contains 57 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Real example of overrelying on irrelevant context. In Turn 2, the user requests UMAP clustering code. In Turn 5, the user says, "use t-SNE instead." Left: When the previous assistant response remains in context, the model incorrectly carries over the Jaccard metric from UMAP into the t-SNE implementation, introducing a bug. Right: Without the previous response in context, the model generates correct t-SNE code with appropriate arguments.
  • Figure 2: Pairwise win rates between Full-Context (FC) and Assistant-Omitted (AO) context responses across all four models (Qwen3-4B, DeepSeek-R1-Distill-Llama-8B, GPT-OSS-20B, and GPT-5.2) evaluated on two real-world conversational datasets (WildChat and ShareLM). Plot (a) shows evaluations under an LLM-judge that sees both the past user and assistant turns for context; Plot (b) shows evaluations under an LLM-judge that sees only the past user turns. Error bars indicate binomial proportion 95% confidence intervals.
  • Figure 3: Pairwise win rates by prompt category (new ask, follow up with feedback, follow up without feedback) for Qwen3-4B (top) and GPT-5.2 (bottom), comparing Full-Context (FC) and Assistant-Omitted (AO) responses. Stars indicate statistically significant differences. Error bars indicate binomial proportion 95% confidence intervals.
  • Figure 4: Three example conversations on WildChat illustrating different types of follow-up prompts. Left: the user provides feedback that is concrete enough that the model can respond from scratch using the previous user turns and the updated specifications alone. Middle: the follow-ups reference specific parts in past assistant turns, making it necessary to see the referenced assistant turn. Right: the follow-up references a prior user turn; no assistant history is necessary.
  • Figure 5: Ratio of adaptive wins to full-context-only wins plotted against the mean token count (in thousands) for different inclusion thresholds. Ties are counted as wins for all plotted configurations. Each blue point corresponds to a different threshold $\tau$ on $P(\mathrm{FC} \succ \mathrm{AO})$. The green triangle marks a heuristic that omits assistant responses on all "new ask" turns.
  • ...and 4 more figures