Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality
Zhimin Hu, Riya Roshan, Sashank Varma
TL;DR
Are More Tokens Rational? investigates whether resource rationality emerges from inference-time scaling in large language models by deploying the Variable Attribution Task (VAT) to control task complexity. The study compares instruction-tuned models with large reasoning models and finds a robust shift from permutation to elimination strategies as complexity grows, with XOR/XNOR functions often resisting pruning, indicating nuanced resource allocation. Crucially, this adaptive behavior arises without explicit cost-based rewards, suggesting that resource rationality is an emergent property of extended inference. The findings imply that reasoning traces and token-length investments reflect internal resource reallocation under finite capacity, informing both interpretation of LLM behavior and design of computation-aware prompting strategies.
Abstract
Human reasoning is shaped by resource rationality -- optimizing performance under constraints. Recently, inference-time scaling has emerged as a powerful paradigm to improve the reasoning performance of Large Language Models by expanding test-time computation. Specifically, instruction-tuned (IT) models explicitly generate long reasoning steps during inference, whereas Large Reasoning Models (LRMs) are trained by reinforcement learning to discover reasoning paths that maximize accuracy. However, it remains unclear whether resource-rationality can emerge from such scaling without explicit reward related to computational costs. We introduce a Variable Attribution Task in which models infer which variables determine outcomes given candidate variables, input-output trials, and predefined logical functions. By varying the number of candidate variables and trials, we systematically manipulate task complexity. Both models exhibit a transition from brute-force to analytic strategies as complexity increases. IT models degrade on XOR and XNOR functions, whereas LRMs remain robust. These findings suggest that models can adjust their reasoning behavior in response to task complexity, even without explicit cost-based reward. It provides compelling evidence that resource rationality is an emergent property of inference-time scaling itself.
