A Resource Model For Neural Scaling Law
Jinyeop Song, Ziming Liu, Max Tegmark, Jeff Gore
TL;DR
The paper tackles neural scaling laws by proposing a resource-based view in which a composite task is decomposed into subtasks that compete for neuron resources. Through toy experiments, it shows single-subtask losses scale as $l \propto N^{-1}$ and that multi-subtask allocations grow homogeneously, enabling a general scaling relation and linking to $\ell \propto N_p^{-1/3}$ under width-depth scaling, consistent with Chinchilla results. It extends the framework to parallel and series compositions, arguing for homogeneous growth of neuron redundancies and linear additivity of subtasks, which yields $\ell \propto N^{-1}$ for composite tasks and suggests broad applicability to general composite tasks. The approach offers a simple, actionable lens for diagnosing and guiding neural network scaling, with implications for predicting LLM performance and for understanding modularity in deep networks.
Abstract
Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.
