Table of Contents
Fetching ...

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

Jonah Brown-Cohen, David Lindner, Rohin Shah

TL;DR

The notion of opaque serial depth is formalized, given by the length of the longest computation that can be done without the use of interpretable intermediate steps like chain of thought, to suggest that opaque serial depth is a useful tool for understanding the potential for models to do significant reasoning that is not externalized.

Abstract

Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Transformer architecture: sufficiently long serial cognition must pass through the chain of thought (Korbak et al., 2025). We formalize this argument through the notion of opaque serial depth, given by the length of the longest computation that can be done without the use of interpretable intermediate steps like chain of thought. Given this formalization, we compute numeric upper bounds on the opaque serial depth of Gemma 3 models, as well as asymptotic results for additional architectures beyond standard LLMs. We also open-source an automated method that can calculate upper bounds on the opaque serial depth of arbitrary neural networks, and use it to demonstrate that Mixture-of-Experts models likely have lower depth than dense models. Overall, our results suggest that opaque serial depth is a useful tool for understanding the potential for models to do significant reasoning that is not externalized.

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

TL;DR

The notion of opaque serial depth is formalized, given by the length of the longest computation that can be done without the use of interpretable intermediate steps like chain of thought, to suggest that opaque serial depth is a useful tool for understanding the potential for models to do significant reasoning that is not externalized.

Abstract

Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Transformer architecture: sufficiently long serial cognition must pass through the chain of thought (Korbak et al., 2025). We formalize this argument through the notion of opaque serial depth, given by the length of the longest computation that can be done without the use of interpretable intermediate steps like chain of thought. Given this formalization, we compute numeric upper bounds on the opaque serial depth of Gemma 3 models, as well as asymptotic results for additional architectures beyond standard LLMs. We also open-source an automated method that can calculate upper bounds on the opaque serial depth of arbitrary neural networks, and use it to demonstrate that Mixture-of-Experts models likely have lower depth than dense models. Overall, our results suggest that opaque serial depth is a useful tool for understanding the potential for models to do significant reasoning that is not externalized.
Paper Structure (47 sections, 4 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 47 sections, 4 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Adapted from korbak2025chain. For Transformers, chain of thought is the only way to pass information from later layers to earlier layers, making it a bottleneck for serial computation. As a result, for tasks that require sufficiently long serial computation, the model will have to externalize some of its reasoning in the chain of thought.
  • Figure 2: Serial depth of a circuit computing an MLP with two hidden layers, given a single input and a single output. The highlighted path is the longest in the circuit, and has length 9, determining the serial depth. The paths from the weights $W_2$ and $W_3$ to the output have a lower length, and so do not contribute to the serial depth.
  • Figure 3: Asymptotic analysis of opaque serial depth for various architectures shows that architectural choices can make a large difference to the amount of opaque serial computation that can be done.
  • Figure 4: Manual optimizations can produce tighter upper bounds on the serial depth of a neural network. When computing $\bm{w} \cdot \bm{x} + b$ where the vectors have dimension 6, an automated calculation can produce a depth of 5, but folding in the addition of the bias into the dot product sum produces a shallower circuit of depth 4.
  • Figure 5: Automatic depth calculations for Gemma models across different sequence lengths.