Table of Contents
Fetching ...

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation

Geng Liu, Fei Zhu, Rong Feng, Changyi Ma, Shiqi Wang, Gaofeng Meng

TL;DR

This work investigates the Lost in Conversation (LiC) phenomenon where LLMs struggle to realign with user intent across multi-turn dialogues. It reframes LiC as an intent alignment gap rather than a pure capability deficit and introduces a Mediator-Assistant framework complemented by a Refiner that distills historical interaction patterns into explicit instructions. By decoupling intent understanding from task execution and using an experience-driven upstream refinement, the approach reduces input entropy and grounds model responses in the user's true goals. Empirical results across diverse backbones and domains show substantial recovery of multi-turn performance and improved reliability, demonstrating the practical potential of user-aware intent modeling in conversational AI.

Abstract

Multi-turn conversation has emerged as a predominant interaction paradigm for Large Language Models (LLMs). Users often employ follow-up questions to refine their intent, expecting LLMs to adapt dynamically. However, recent research reveals that LLMs suffer a substantial performance drop in multi-turn settings compared to single-turn interactions with fully specified instructions, a phenomenon termed ``Lost in Conversation'' (LiC). While this prior work attributes LiC to model unreliability, we argue that the root cause lies in an intent alignment gap rather than intrinsic capability deficits. In this paper, we first demonstrate that LiC is not a failure of model capability but rather a breakdown in interaction between users and LLMs. We theoretically show that scaling model size or improving training alone cannot resolve this gap, as it arises from structural ambiguity in conversational context rather than representational limitations. To address this, we propose to decouple intent understanding from task execution through a Mediator-Assistant architecture. By utilizing an experience-driven Mediator to explicate user inputs into explicit, well-structured instructions based on historical interaction patterns, our approach effectively bridges the gap between vague user intent and model interpretation. Experimental results demonstrate that this method significantly mitigates performance degradation in multi-turn conversations across diverse LLMs.

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation

TL;DR

This work investigates the Lost in Conversation (LiC) phenomenon where LLMs struggle to realign with user intent across multi-turn dialogues. It reframes LiC as an intent alignment gap rather than a pure capability deficit and introduces a Mediator-Assistant framework complemented by a Refiner that distills historical interaction patterns into explicit instructions. By decoupling intent understanding from task execution and using an experience-driven upstream refinement, the approach reduces input entropy and grounds model responses in the user's true goals. Empirical results across diverse backbones and domains show substantial recovery of multi-turn performance and improved reliability, demonstrating the practical potential of user-aware intent modeling in conversational AI.

Abstract

Multi-turn conversation has emerged as a predominant interaction paradigm for Large Language Models (LLMs). Users often employ follow-up questions to refine their intent, expecting LLMs to adapt dynamically. However, recent research reveals that LLMs suffer a substantial performance drop in multi-turn settings compared to single-turn interactions with fully specified instructions, a phenomenon termed ``Lost in Conversation'' (LiC). While this prior work attributes LiC to model unreliability, we argue that the root cause lies in an intent alignment gap rather than intrinsic capability deficits. In this paper, we first demonstrate that LiC is not a failure of model capability but rather a breakdown in interaction between users and LLMs. We theoretically show that scaling model size or improving training alone cannot resolve this gap, as it arises from structural ambiguity in conversational context rather than representational limitations. To address this, we propose to decouple intent understanding from task execution through a Mediator-Assistant architecture. By utilizing an experience-driven Mediator to explicate user inputs into explicit, well-structured instructions based on historical interaction patterns, our approach effectively bridges the gap between vague user intent and model interpretation. Experimental results demonstrate that this method significantly mitigates performance degradation in multi-turn conversations across diverse LLMs.
Paper Structure (31 sections, 8 equations, 7 figures, 2 tables)

This paper contains 31 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Intent Mismatch in Multi-turn Dialogue. (Left) The LiC benchmark simulates passive users who act as "lazy" interlocutors, omitting corrections for erroneous model assumptions. This behavior causes the Assistant's interpretation to progressively drift away from the user's true intent, leading to significant performance degradation. (Right) Our approach introduces a Mediator to bridge this pragmatic gap by fundamentally decoupling intent inference from task execution. The Mediator aligns the Assistant with the user's true goals, effectively mitigating performance degradation.
  • Figure 2: Performance comparison across different LLMs on the LiC benchmark laban2025llms. While absolute performance improves with model scale, the relative performance degradation remains strikingly constant ($\sim$60%). This structural invariance suggests that the bottleneck lies in the alignment prior rather than model capacity.
  • Figure 3: Pipeline of the Mediator Framework. We construct contrastive pairs by extracting a failed conversational trajectory $D^{-}$ and the corresponding successful trajectory $D^{+}$ for the same task instance from the user's historical logs. The Refiner distills these pairs into explicit pragmatic experiences $\mathcal{E}$, which guide the Mediator to explicate ambiguous user contexts into precise instructions for the Assistant. The Mediator operates as a transparent alignment layer, decoupling the user from the raw execution model while maintaining a seamless interaction flow.
  • Figure 4: Comparison with In-Context Learning. We compare our method against the Oracle baseline and a Direct ICL approach. Our method delivers a substantial performance boost over the Oracle with only a marginal cost increase. While ICL achieves comparable accuracy to Ours, it consumes $3.6 \times$ more tokens.
  • Figure :
  • ...and 2 more figures