Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Sandeep Reddy, Kabir Khan, Rohit Patil, Ananya Chakraborty, Faizan A. Khan, Swati Kulkarni, Arjun Verma, Neha Singh
TL;DR
The paper tackles the challenge of the high computational cost of large language models by introducing a Computational Economics framework that treats internal components (attention heads and FFN blocks) as rational agents operating under a finite budget. It shows that scarcity drives the model to reallocate computations toward high-utility tokens while largely preserving task performance, revealing an internal economy in action. By adding a differentiable computational-cost term to the loss, it trains models to achieve a Pareto frontier of accuracy and efficiency, achieving substantial FLOPS reductions and latency improvements with sparser, more interpretable activations. The results across GLUE tasks and WikiText-103 demonstrate practical gains and offer a principled route to efficient, adaptive, and transparent LLMs under resource constraints, with broader implications for deployment and interpretability.
Abstract
Large language models (LLMs) are limited by substantial computational cost. We introduce a "computational economics" framework that treats an LLM as an internal economy of resource-constrained agents (attention heads and neuron blocks) that must allocate scarce computation to maximize task utility. First, we show empirically that when computation is scarce, standard LLMs reallocate attention toward high-value tokens while preserving accuracy. Building on this observation, we propose an incentive-driven training paradigm that augments the task loss with a differentiable computation cost term, encouraging sparse and efficient activations. On GLUE (MNLI, STS-B, CoLA) and WikiText-103, the method yields a family of models that trace a Pareto frontier and consistently dominate post-hoc pruning; for a similar accuracy we obtain roughly a forty percent reduction in FLOPS and lower latency, together with more interpretable attention patterns. These results indicate that economic principles offer a principled route to designing efficient, adaptive, and more transparent LLMs under strict resource constraints.
