Table of Contents
Fetching ...

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

John Dunbar, Scott Aaronson

TL;DR

The paper addresses the question of whether random neural networks can exhibit no-coincidence behavior akin to rare, detectable structural properties in circuits. It develops a first-order perturbative framework and employs Hermite–Mehler analysis to show that wide networks with nonlinear zero-mean activations under Gaussian inputs yield outputs that become nearly independent as depth increases. The key insight is that a fixed-point map on input covariances decays to zero when the activation has zero mean, enabling the outputs to behave like random functions in the large-width limit. This provides a practical baseline for the computational no-coincidence conjecture in neural networks and motivates a neural-network version of the conjecture, highlighting tanh-like zero-mean activations and outlining limitations and future directions for interpretability research.

Abstract

We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[σ(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

TL;DR

The paper addresses the question of whether random neural networks can exhibit no-coincidence behavior akin to rare, detectable structural properties in circuits. It develops a first-order perturbative framework and employs Hermite–Mehler analysis to show that wide networks with nonlinear zero-mean activations under Gaussian inputs yield outputs that become nearly independent as depth increases. The key insight is that a fixed-point map on input covariances decays to zero when the activation has zero mean, enabling the outputs to behave like random functions in the large-width limit. This provides a practical baseline for the computational no-coincidence conjecture in neural networks and motivates a neural-network version of the conjecture, highlighting tanh-like zero-mean activations and outlining limitations and future directions for interpretability research.

Abstract

We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: . For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.

Paper Structure

This paper contains 11 sections, 6 theorems, 47 equations, 1 figure.

Key Result

Theorem 3.1

For a neural network as defined in sec:arch with random weights and a fixed set of inputs $\mathcal{D} = \{x_{i;\alpha}\}$ (where $x_{i;\alpha} \in \mathbb{R}$ is index $i$ of datapoint $\alpha$) and in the nondegenerate case when the covariance matrix is invertible, the preactivations at every laye Here, the terms $K^{(\ell)}_{\alpha_1 \alpha_2}$, $H^{(\ell)}_{\alpha_1\alpha_2}$, and $J^{(\ell)}_

Figures (1)

  • Figure 1: Graphs of $K^{(\ell+1)}_{\alpha_1 \alpha_2} = \mathcal{C}(K^{(\ell)}_{\alpha_1 \alpha_2})$ for when the activation function is ReLU (left), $\tanh(4x)$ (center), or a shifted ReLU (right). The $y=x$ line in dotted blue is included for comparison. For ReLU, the repeated application of $\mathcal{C}$ brings initial points towards the stable fixed point at $K^{(\ell)}_{\alpha_1 \alpha_2}=1$. This means the preactivations on different inputs become more and more correlated as depth increases. For tanh (for which the $4x$ scaling was chosen to emphasize the effect and doesn't change it qualitatively), the repeated application of $\mathcal{C}$ brings initial points towards 0 unless they start at $1$ or $-1$. This means preactivations on different inputs usually become less and less correlated as depth increases, but they stay identical if they started identical up to a sign. A similar effect occurs for a ReLU shifted to have zero mean under the Gaussian measure.

Theorems & Definitions (13)

  • Theorem 3.1: roberts22_pdlt
  • Theorem 4.1
  • Corollary 4.1
  • proof
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • ...and 3 more