Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
Gaurav Singh, Abhishek Dey, Janit Bidhan, Tanu Kansal, Paras Kath, Saurabh Srivastava
TL;DR
This work investigates batch prompting as an inference-time regularizer for large reasoning models, showing that batching reduces overthinking and reasoning tokens while preserving accuracy across 13 benchmarks and two models. By amortizing fixed prompt costs and limiting per-query reasoning under batch context, the approach achieves about a 74% reduction in reasoning tokens with minimal accuracy loss, and exhibits emergent phenomena like pattern induction and hedging suppression. Explicit prompt-based constraints are shown to be ineffective, highlighting batch prompting as a robust, prompt-only solution. The findings offer a practical, model-agnostic method to improve efficiency in latency- and cost-sensitive reasoning deployments.
Abstract
Recent work has explored batch prompting as a strategy to amortize inference cost in large language models (LLMs). In this paper, we show that batching offers an additional, underappreciated benefit: it regularizes model behavior during multi-step reasoning for Large Reasoning Models (LRMs). We conduct a comprehensive study across 13 diverse benchmarks and observe that batching improves accuracy while substantially reducing reasoning token usage, often by 3x-5x. Through detailed behavioral analysis, we find that batching suppresses overthinking, reduces hedging language (e.g., repetitive self-corrections), and encourages more decisive answers. Surprisingly, we also observe emergent collective effects in batched inference: models often generalize patterns from earlier examples to solve harder ones in the same batch. These findings position batching not just as a throughput optimization, but as a powerful inference-time regularizer for more efficient and reliable LLM reasoning.
