Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan, Teng Fu, Hao Feng, Jingqun Tang, Han Wang, Can Huang
TL;DR
Prolonged chain-of-thought is not universally beneficial; the authors introduce Certainty-based Adaptive Reasoning (CAR), which uses perplexity-based confidence estimates to decide whether to produce a short answer or invoke long-form reasoning. CAR trains models to generate both response types and models PPL distributions for correct vs incorrect short answers, applying Bayes’ decision rule to route inference. Empirical results across multimodal VQA/KIE and textual reasoning tasks show CAR achieves higher accuracy than both purely short-answer and purely long-form approaches while dramatically reducing token usage, outperforming existing token-reduction baselines. The work offers a practical, adaptive framework to balance accuracy and efficiency in LLM/MLLM reasoning with potential for synergy with other efficiency techniques.
Abstract
Recent advancements in reasoning have significantly enhanced the capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) across diverse tasks. However, excessive reliance on chain-of-thought (CoT) reasoning can impair model performance and brings unnecessarily lengthened outputs, reducing efficiency. Our work reveals that prolonged reasoning does not universally improve accuracy and even degrade performance on simpler tasks. To address this, we propose Certainty-based Adaptive Reasoning (CAR), a novel framework that dynamically switches between short answers and long-form reasoning based on the model perplexity. CAR first generates a short answer and evaluates its perplexity, triggering reasoning only when the model exhibits low confidence (i.e., high perplexity). Experiments across diverse multimodal VQA/KIE benchmarks and text reasoning datasets show that CAR outperforms both short-answer and long-form reasoning approaches, striking an optimal balance between accuracy and efficiency.
