Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

John Dunbar; Scott Aaronson

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

John Dunbar, Scott Aaronson

TL;DR

The paper addresses the question of whether random neural networks can exhibit no-coincidence behavior akin to rare, detectable structural properties in circuits. It develops a first-order perturbative framework and employs Hermite–Mehler analysis to show that wide networks with nonlinear zero-mean activations under Gaussian inputs yield outputs that become nearly independent as depth increases. The key insight is that a fixed-point map on input covariances decays to zero when the activation has zero mean, enabling the outputs to behave like random functions in the large-width limit. This provides a practical baseline for the computational no-coincidence conjecture in neural networks and motivates a neural-network version of the conjecture, highlighting tanh-like zero-mean activations and outlining limitations and future directions for interpretability research.

Abstract

We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[σ(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

TL;DR

Abstract

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (13)