Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation
Seungseop Lim, Gibaeg Kim, Wooseok Han, Jean Seo, Hyunkyung Lee, Jaehyo Yoo, Eunho Yang
TL;DR
This work investigates Format Inertia, a failure mode where LLMs trained on skewed turn-length medical dialogues generate repetitive, format-compliant but diagnostically uninformative questions in long conversations. It introduces Uniform Turn-Count Dataset as a simple data-centric mitigation that balances exposure to short and long dialogues, improving long-range contextual adherence. Empirical results show that uniform turn-length training recovers or enhances Task-Constraint Satisfaction Rates (TCSR) while maintaining Format-Constraint Satisfaction Rates (FCSR) across multiple models and scales, with concrete gains such as Gemma-3 (4B) achieving FCSR 0.967 and TCSR 0.891 under uniform 1k samples. The findings underscore the critical role of training-data turn-length distribution in multi-turn medical dialogue systems and advocate for balanced data curation to ensure clinically meaningful interactions in deployment.
Abstract
Recent advances in Large Language Models (LLMs) have brought significant improvements to various service domains, including chatbots and medical pre-consultation applications. In the healthcare domain, the most common approach for adapting LLMs to multi-turn dialogue generation is Supervised Fine-Tuning (SFT). However, datasets for SFT in tasks like medical pre-consultation typically exhibit a skewed turn-count distribution. Training on such data induces a novel failure mechanism we term Format Inertia, where models tend to generate repetitive, format-correct, but diagnostically uninformative questions in long medical dialogues. To mitigate this observed failure mechanism, we adopt a simple, data-centric method that rebalances the turn-count distribution of the training dataset. Experimental results show that our approach substantially alleviates Format Inertia in medical pre-consultation.
