Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
Geng Liu, Fei Zhu, Rong Feng, Changyi Ma, Shiqi Wang, Gaofeng Meng
TL;DR
This work investigates the Lost in Conversation (LiC) phenomenon where LLMs struggle to realign with user intent across multi-turn dialogues. It reframes LiC as an intent alignment gap rather than a pure capability deficit and introduces a Mediator-Assistant framework complemented by a Refiner that distills historical interaction patterns into explicit instructions. By decoupling intent understanding from task execution and using an experience-driven upstream refinement, the approach reduces input entropy and grounds model responses in the user's true goals. Empirical results across diverse backbones and domains show substantial recovery of multi-turn performance and improved reliability, demonstrating the practical potential of user-aware intent modeling in conversational AI.
Abstract
Multi-turn conversation has emerged as a predominant interaction paradigm for Large Language Models (LLMs). Users often employ follow-up questions to refine their intent, expecting LLMs to adapt dynamically. However, recent research reveals that LLMs suffer a substantial performance drop in multi-turn settings compared to single-turn interactions with fully specified instructions, a phenomenon termed ``Lost in Conversation'' (LiC). While this prior work attributes LiC to model unreliability, we argue that the root cause lies in an intent alignment gap rather than intrinsic capability deficits. In this paper, we first demonstrate that LiC is not a failure of model capability but rather a breakdown in interaction between users and LLMs. We theoretically show that scaling model size or improving training alone cannot resolve this gap, as it arises from structural ambiguity in conversational context rather than representational limitations. To address this, we propose to decouple intent understanding from task execution through a Mediator-Assistant architecture. By utilizing an experience-driven Mediator to explicate user inputs into explicit, well-structured instructions based on historical interaction patterns, our approach effectively bridges the gap between vague user intent and model interpretation. Experimental results demonstrate that this method significantly mitigates performance degradation in multi-turn conversations across diverse LLMs.
