On the Complexity of Neural Computation in Superposition

Micah Adler; Nir Shavit

On the Complexity of Neural Computation in Superposition

Micah Adler, Nir Shavit

TL;DR

This work establishes foundational bounds for computing in neural networks using superposition, showing that for problems like Neural Permutation and 2-AND, a network with $n$ neurons can compute at most $O(n^2 / log n)$ features while any correct computation requires at least $n = heta(\,"?"). The authors prove lower bounds via an information-theoretic parameterization argument and provide near-tight constructive upper bounds using a three-channel, compressed encoding scheme that performs all operations with ReLU activations and thresholding. They also develop a detailed, multi-case upper-bound construction for 2-AND under varying feature influences and extend the framework to bit-level parameter complexity, multiple inputs, and multi-layer architectures. The results reveal an exponential gap between the capacity for passive feature representation and active computation in superposition, and they discuss implications for mechanistic interpretability and potential extensions to large language models and Boolean circuits.

Abstract

Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing $m'$ features in superposition requires at least $Ω(\sqrt{m' \log m'})$ neurons and $Ω(m' \log m')$ parameters. This implies the first subexponential upper bound on superposition capacity: a network with $n$ neurons can compute at most $O(n^2 / \log n)$ features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m'} \log m')$ neurons and $O(m' \log^2 m')$ parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as $O(\log m')$ neurons based on the Johnson-Lindenstrauss Lemma. Our hope is that our results open a path for using complexity theoretic techniques in neural network interpretability research.

On the Complexity of Neural Computation in Superposition

TL;DR

This work establishes foundational bounds for computing in neural networks using superposition, showing that for problems like Neural Permutation and 2-AND, a network with

neurons can compute at most

features while any correct computation requires at least $n = heta(\,"?"). The authors prove lower bounds via an information-theoretic parameterization argument and provide near-tight constructive upper bounds using a three-channel, compressed encoding scheme that performs all operations with ReLU activations and thresholding. They also develop a detailed, multi-case upper-bound construction for 2-AND under varying feature influences and extend the framework to bit-level parameter complexity, multiple inputs, and multi-layer architectures. The results reveal an exponential gap between the capacity for passive feature representation and active computation in superposition, and they discuss implications for mechanistic interpretability and potential extensions to large language models and Boolean circuits.

Abstract

features in superposition requires at least

neurons and

parameters. This implies the first subexponential upper bound on superposition capacity: a network with

neurons can compute at most

features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using

neurons and

parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as

neurons based on the Johnson-Lindenstrauss Lemma. Our hope is that our results open a path for using complexity theoretic techniques in neural network interpretability research.

Paper Structure (24 sections, 9 theorems, 19 equations, 5 figures)

This paper contains 24 sections, 9 theorems, 19 equations, 5 figures.

Introduction
Problem Formulation
Our Results
Related Work
Modeling Neural Computation
Lower Bound Model: Parameter Driven Algorithms
Upper bound model: Multi-layer perceptrons
Lower Bounds
Parameter Driven Algorithms with Errors
Possible Extensions to LLM Parameterization
Upper Bounds
Input Encoding and Neural Permutation
Algorithm for 2-AND with maximum feature influence 1
High level outline of main algorithm
Algorithm for double light outputs
...and 9 more sections

Key Result

Theorem 3.1

Let $U$ and $V$ be finite sets, and let $\mathcal{F}\subseteq\{\,F : U \to V\}$ be a set of distinct functions. Suppose $T$ is a parameter driven algorithm for $\mathcal{F}$, with parameter function $P(F)$ mapping each $F\in\mathcal{F}$ to a bit string. If then for almost all $F \in \mathcal{F}$, we have $\bigl|P(F)\bigr|\ge\log_2 \bigl|\mathcal{F}\bigr|.$

Figures (5)

Figure 2: 2-AND computation
Figure 3: Resulting $n \times n$ matrices used for inference
Figure 4: A partition of 2-AND into two subproblems. The red regions compute one subproblem, and the blue regions the other. All entries in other regions will be 0. Note that in $C_1'$, the rows do overlap. This is to set up the outputs of this layer as the inputs to next layer, specifically to allow the same resulting input to appear in up to two subproblems.
Figure 5: The matrix $C_0$ for Low-Influence-AND. The circled 1s are those that correspond to $s_i$, where output $i$ computes $j_1 \land j_2$, and thus the 1s in those rows will line up between $j_1$ and $j_2$. Other rows with 1s come from different column specifications, and thus only line up by chance, but when that happens it causes spurious 1s to appear after the second ReLU. When there are at most $O(m'^{\,1/4} \log m')$ total 1s in each column, it is likely there will be $O(\log m')$ such spurious 1s. However, since $n = O(\sqrt{m'} \log m')$, if there were more 1s in both columns, the number of spurious 1s would become too large to handle. This is why $m'^{\,1/4}$ represents such an important phase change for what techniques are effective for this problem.
Figure 6: The dependence in light inputs between the different choices for $C_0(e,a)$, when $a \in L(C_0)$. Here $a_1$ and $a_2$ are light inputs, and so $a_1, a_2 \in L(C_0)$, and $h$ is a heavy input such that both $h \land a_1$ and $h \land a_2$ are computed. If we ignore the impact of other heavy inputs then if $C_0(e,h) = 0$, then both $C_0(e,a_1) = 0$ and $C_0(e,a_2) = 0$. Thus, $\Pr[C_0(e,a_2) = 1 | C_0(e,a_1) = 1] \gg \Pr[C_0(e,a_2) = 1 | C_0(e,a_1) = 0]$, and so $C_0(e,a_2)$ and $C_0(e,a_1)$ are not independent.

Theorems & Definitions (34)

Theorem 3.1
proof
Theorem 3.2
proof
Corollary 3.2.1
proof
Corollary 3.2.2
proof
Theorem 4.1
proof
...and 24 more

On the Complexity of Neural Computation in Superposition

TL;DR

Abstract

On the Complexity of Neural Computation in Superposition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (34)