Ask Again, Then Fail: Large Language Models' Vacillations in Judgment
Qiming Xie, Zengzhi Wang, Yi Feng, Rui Xia
TL;DR
The paper identifies a pervasive judgment consistency issue in state-of-the-art LLMs: models often revise correct answers when confronted with follow-up questioning. It proposes a Follow-up Questioning Mechanism and two metrics to quantify this wavering, then demonstrates the universality of the problem across multiple models and domains. To mitigate the issue, it offers training-free prompting strategies and a training-based framework, Unwavering-FQ, which uses polarized preference context distillation and direct preference optimization to preserve initial correct judgments while maintaining overall conversational ability (as reflected in MT-Bench). Empirical results show meaningful improvements in judgment consistency and general capabilities, with data and prompts released to support future research. The work advances evaluation paradigms for LLM reliability and provides practical mitigation paths for both closed-source and open-source models.
Abstract
We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current language models. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches language models to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.
