Table of Contents
Fetching ...

Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation

Seungseop Lim, Gibaeg Kim, Wooseok Han, Jean Seo, Hyunkyung Lee, Jaehyo Yoo, Eunho Yang

TL;DR

This work investigates Format Inertia, a failure mode where LLMs trained on skewed turn-length medical dialogues generate repetitive, format-compliant but diagnostically uninformative questions in long conversations. It introduces Uniform Turn-Count Dataset as a simple data-centric mitigation that balances exposure to short and long dialogues, improving long-range contextual adherence. Empirical results show that uniform turn-length training recovers or enhances Task-Constraint Satisfaction Rates (TCSR) while maintaining Format-Constraint Satisfaction Rates (FCSR) across multiple models and scales, with concrete gains such as Gemma-3 (4B) achieving FCSR 0.967 and TCSR 0.891 under uniform 1k samples. The findings underscore the critical role of training-data turn-length distribution in multi-turn medical dialogue systems and advocate for balanced data curation to ensure clinically meaningful interactions in deployment.

Abstract

Recent advances in Large Language Models (LLMs) have brought significant improvements to various service domains, including chatbots and medical pre-consultation applications. In the healthcare domain, the most common approach for adapting LLMs to multi-turn dialogue generation is Supervised Fine-Tuning (SFT). However, datasets for SFT in tasks like medical pre-consultation typically exhibit a skewed turn-count distribution. Training on such data induces a novel failure mechanism we term Format Inertia, where models tend to generate repetitive, format-correct, but diagnostically uninformative questions in long medical dialogues. To mitigate this observed failure mechanism, we adopt a simple, data-centric method that rebalances the turn-count distribution of the training dataset. Experimental results show that our approach substantially alleviates Format Inertia in medical pre-consultation.

Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation

TL;DR

This work investigates Format Inertia, a failure mode where LLMs trained on skewed turn-length medical dialogues generate repetitive, format-compliant but diagnostically uninformative questions in long conversations. It introduces Uniform Turn-Count Dataset as a simple data-centric mitigation that balances exposure to short and long dialogues, improving long-range contextual adherence. Empirical results show that uniform turn-length training recovers or enhances Task-Constraint Satisfaction Rates (TCSR) while maintaining Format-Constraint Satisfaction Rates (FCSR) across multiple models and scales, with concrete gains such as Gemma-3 (4B) achieving FCSR 0.967 and TCSR 0.891 under uniform 1k samples. The findings underscore the critical role of training-data turn-length distribution in multi-turn medical dialogue systems and advocate for balanced data curation to ensure clinically meaningful interactions in deployment.

Abstract

Recent advances in Large Language Models (LLMs) have brought significant improvements to various service domains, including chatbots and medical pre-consultation applications. In the healthcare domain, the most common approach for adapting LLMs to multi-turn dialogue generation is Supervised Fine-Tuning (SFT). However, datasets for SFT in tasks like medical pre-consultation typically exhibit a skewed turn-count distribution. Training on such data induces a novel failure mechanism we term Format Inertia, where models tend to generate repetitive, format-correct, but diagnostically uninformative questions in long medical dialogues. To mitigate this observed failure mechanism, we adopt a simple, data-centric method that rebalances the turn-count distribution of the training dataset. Experimental results show that our approach substantially alleviates Format Inertia in medical pre-consultation.

Paper Structure

This paper contains 42 sections, 2 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Example of Format Inertia in Medical Pre-Consultation. When trained on skewed turn-count distribution, the model overly relies on previously generated question patterns—preserving superficial format but failing to contribute new diagnostic information (#2→#10) and repeating identical questions (#11→#12). Format Inertia not only stalls clinical progress but also leaves the patient feeling confused, thereby undermining the overall user experience.
  • Figure 2: Models trained on skewed turn-count data show a progressive increase in Jaccard and Cosine similarity across dialogue turns, indicating an intensifying pattern of repetitive questioning driven by Format Inertia, in contrast to the base model.
  • Figure 3: Impact of Skewed turn-count data on TCSR. Inverse relationship between the frequency of turns in the Skewed Turn training data (left y-axis) and the Task-Constraint failure rate (1-TCSR) in evaluation (right y-axis), highlighting performance degradation on underrepresented long turns.
  • Figure 4: Interface of the medical pre-consultation platform where doctor and patient models engage in interactive pre-consultation dialogues.
  • Figure 5: The Evaluation Prompt.
  • ...and 6 more figures