On the Complexity of Neural Computation in Superposition
Micah Adler, Nir Shavit
TL;DR
This work establishes foundational bounds for computing in neural networks using superposition, showing that for problems like Neural Permutation and 2-AND, a network with $n$ neurons can compute at most $O(n^2 / log n)$ features while any correct computation requires at least $n = heta(\,"?"). The authors prove lower bounds via an information-theoretic parameterization argument and provide near-tight constructive upper bounds using a three-channel, compressed encoding scheme that performs all operations with ReLU activations and thresholding. They also develop a detailed, multi-case upper-bound construction for 2-AND under varying feature influences and extend the framework to bit-level parameter complexity, multiple inputs, and multi-layer architectures. The results reveal an exponential gap between the capacity for passive feature representation and active computation in superposition, and they discuss implications for mechanistic interpretability and potential extensions to large language models and Boolean circuits.
Abstract
Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing $m'$ features in superposition requires at least $Ω(\sqrt{m' \log m'})$ neurons and $Ω(m' \log m')$ parameters. This implies the first subexponential upper bound on superposition capacity: a network with $n$ neurons can compute at most $O(n^2 / \log n)$ features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m'} \log m')$ neurons and $O(m' \log^2 m')$ parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as $O(\log m')$ neurons based on the Johnson-Lindenstrauss Lemma. Our hope is that our results open a path for using complexity theoretic techniques in neural network interpretability research.
