Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Zuoli Tang, Junjie Ou, Kaiqin Hu, Chunwei Wu, Zhaoxin Huan, Chilin Fu, Xiaolu Zhang, Jun Zhou, Chenliang Li
TL;DR
The paper investigates how short-path prompts undermine reasoning in state-of-the-art LLMs due to conflicts with hidden chain-of-thought prompts. It introduces GSM8K-new and GSM8K-new-choice to rigorously evaluate reasoning under varied prompts and demonstrates that performance collapses for multi-step problems and exhibits strong positional bias in multiple-choice settings. To address these issues, two strategies are proposed: an instruction-guided method that resolves prompt conflicts via system prompts, and a rule-based filter fine-tuning (RFFT) approach that trains intrinsic resistance to short-path prompts. Experimental results across four reasoning benchmarks show that both methods substantially improve robustness and accuracy, with RFFT offering stronger resistance on MC tasks and preserving instruction-following capabilities. The work highlights the practical importance of designing prompts and training regimes that balance user instruction with reliable reasoning in LLMs.
Abstract
Recent years have witnessed significant progress in large language models' (LLMs) reasoning, which is largely due to the chain-of-thought (CoT) approaches, allowing models to generate intermediate reasoning steps before reaching the final answer. Building on these advances, state-of-the-art LLMs are instruction-tuned to provide long and detailed CoT pathways when responding to reasoning-related questions. However, human beings are naturally cognitive misers and will prompt language models to give rather short responses, thus raising a significant conflict with CoT reasoning. In this paper, we delve into how LLMs' reasoning performance changes when users provide short-path prompts. The results and analysis reveal that language models can reason effectively and robustly without explicit CoT prompts, while under short-path prompting, LLMs' reasoning ability drops significantly and becomes unstable, even on grade-school problems. To address this issue, we propose two approaches: an instruction-guided approach and a fine-tuning approach, both designed to effectively manage the conflict. Experimental results show that both methods achieve high accuracy, providing insights into the trade-off between instruction adherence and reasoning accuracy in current models.
