Table of Contents
Fetching ...

On the Complexity of Neural Computation in Superposition

Micah Adler, Nir Shavit

TL;DR

This work establishes foundational bounds for computing in neural networks using superposition, showing that for problems like Neural Permutation and 2-AND, a network with $n$ neurons can compute at most $O(n^2 / log n)$ features while any correct computation requires at least $n = heta(\,"?"). The authors prove lower bounds via an information-theoretic parameterization argument and provide near-tight constructive upper bounds using a three-channel, compressed encoding scheme that performs all operations with ReLU activations and thresholding. They also develop a detailed, multi-case upper-bound construction for 2-AND under varying feature influences and extend the framework to bit-level parameter complexity, multiple inputs, and multi-layer architectures. The results reveal an exponential gap between the capacity for passive feature representation and active computation in superposition, and they discuss implications for mechanistic interpretability and potential extensions to large language models and Boolean circuits.

Abstract

Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing $m'$ features in superposition requires at least $Ω(\sqrt{m' \log m'})$ neurons and $Ω(m' \log m')$ parameters. This implies the first subexponential upper bound on superposition capacity: a network with $n$ neurons can compute at most $O(n^2 / \log n)$ features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using $O(\sqrt{m'} \log m')$ neurons and $O(m' \log^2 m')$ parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as $O(\log m')$ neurons based on the Johnson-Lindenstrauss Lemma. Our hope is that our results open a path for using complexity theoretic techniques in neural network interpretability research.

On the Complexity of Neural Computation in Superposition

TL;DR

This work establishes foundational bounds for computing in neural networks using superposition, showing that for problems like Neural Permutation and 2-AND, a network with neurons can compute at most features while any correct computation requires at least $n = heta(\,"?"). The authors prove lower bounds via an information-theoretic parameterization argument and provide near-tight constructive upper bounds using a three-channel, compressed encoding scheme that performs all operations with ReLU activations and thresholding. They also develop a detailed, multi-case upper-bound construction for 2-AND under varying feature influences and extend the framework to bit-level parameter complexity, multiple inputs, and multi-layer architectures. The results reveal an exponential gap between the capacity for passive feature representation and active computation in superposition, and they discuss implications for mechanistic interpretability and potential extensions to large language models and Boolean circuits.

Abstract

Superposition, the ability of neural networks to represent more features than neurons, is increasingly seen as key to the efficiency of large models. This paper investigates the theoretical foundations of computing in superposition, establishing complexity bounds for explicit, provably correct algorithms. We present the first lower bounds for a neural network computing in superposition, showing that for a broad class of problems, including permutations and pairwise logical operations, computing features in superposition requires at least neurons and parameters. This implies the first subexponential upper bound on superposition capacity: a network with neurons can compute at most features. Conversely, we provide a nearly tight constructive upper bound: logical operations like pairwise AND can be computed using neurons and parameters. There is thus an exponential gap between the complexity of computing in superposition (the subject of this work) versus merely representing features, which can require as little as neurons based on the Johnson-Lindenstrauss Lemma. Our hope is that our results open a path for using complexity theoretic techniques in neural network interpretability research.
Paper Structure (24 sections, 9 theorems, 19 equations, 5 figures)

This paper contains 24 sections, 9 theorems, 19 equations, 5 figures.

Key Result

Theorem 3.1

Let $U$ and $V$ be finite sets, and let $\mathcal{F}\subseteq\{\,F : U \to V\}$ be a set of distinct functions. Suppose $T$ is a parameter driven algorithm for $\mathcal{F}$, with parameter function $P(F)$ mapping each $F\in\mathcal{F}$ to a bit string. If then for almost all $F \in \mathcal{F}$, we have $\bigl|P(F)\bigr|\ge\log_2 \bigl|\mathcal{F}\bigr|.$

Figures (5)

  • Figure 2: 2-AND computation
  • Figure 3: Resulting $n \times n$ matrices used for inference
  • Figure 4: A partition of 2-AND into two subproblems. The red regions compute one subproblem, and the blue regions the other. All entries in other regions will be 0. Note that in $C_1'$, the rows do overlap. This is to set up the outputs of this layer as the inputs to next layer, specifically to allow the same resulting input to appear in up to two subproblems.
  • Figure 5: The matrix $C_0$ for Low-Influence-AND. The circled 1s are those that correspond to $s_i$, where output $i$ computes $j_1 \land j_2$, and thus the 1s in those rows will line up between $j_1$ and $j_2$. Other rows with 1s come from different column specifications, and thus only line up by chance, but when that happens it causes spurious 1s to appear after the second ReLU. When there are at most $O(m'^{\,1/4} \log m')$ total 1s in each column, it is likely there will be $O(\log m')$ such spurious 1s. However, since $n = O(\sqrt{m'} \log m')$, if there were more 1s in both columns, the number of spurious 1s would become too large to handle. This is why $m'^{\,1/4}$ represents such an important phase change for what techniques are effective for this problem.
  • Figure 6: The dependence in light inputs between the different choices for $C_0(e,a)$, when $a \in L(C_0)$. Here $a_1$ and $a_2$ are light inputs, and so $a_1, a_2 \in L(C_0)$, and $h$ is a heavy input such that both $h \land a_1$ and $h \land a_2$ are computed. If we ignore the impact of other heavy inputs then if $C_0(e,h) = 0$, then both $C_0(e,a_1) = 0$ and $C_0(e,a_2) = 0$. Thus, $\Pr[C_0(e,a_2) = 1 | C_0(e,a_1) = 1] \gg \Pr[C_0(e,a_2) = 1 | C_0(e,a_1) = 0]$, and so $C_0(e,a_2)$ and $C_0(e,a_1)$ are not independent.

Theorems & Definitions (34)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Corollary 3.2.1
  • proof
  • Corollary 3.2.2
  • proof
  • Theorem 4.1
  • proof
  • ...and 24 more