Table of Contents
Fetching ...

CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

Awni Altabaa, Omar Montasser, John Lafferty

TL;DR

The paper develops a statistical theory for learning under chain-of-thought supervision by introducing the CoT information measure, which quantifies how informative CoT traces are for distinguishing end-to-end behavior. It proves that end-to-end error rates under CoT supervision scale with sample size as $m = O(d / \mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(\epsilon; \mathcal{H}))$, potentially far faster than standard PAC rates. The authors provide upper bounds for realizable and agnostic settings (finite and infinite hypothesis spaces) and establish information-theoretic lower bounds showing the fundamental role of CoT information in the learning problem. Simulations on DFA-like and autoregressive-style CoT classes validate the theory, showing substantial empirical gains in sample efficiency when cotinfo is large. Overall, the CoT information framework offers a principled way to quantify the value of chain-of-thought supervision for faster learning and generalization.

Abstract

Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the *CoT information measure* $\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(ε; \calH)$, which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error $ε$ scales as $d/\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(ε; \calH)$, where $d$ is a measure of hypothesis class complexity, which can be much faster than standard $d/ε$ rates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.

CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

TL;DR

The paper develops a statistical theory for learning under chain-of-thought supervision by introducing the CoT information measure, which quantifies how informative CoT traces are for distinguishing end-to-end behavior. It proves that end-to-end error rates under CoT supervision scale with sample size as , potentially far faster than standard PAC rates. The authors provide upper bounds for realizable and agnostic settings (finite and infinite hypothesis spaces) and establish information-theoretic lower bounds showing the fundamental role of CoT information in the learning problem. Simulations on DFA-like and autoregressive-style CoT classes validate the theory, showing substantial empirical gains in sample efficiency when cotinfo is large. Overall, the CoT information framework offers a principled way to quantify the value of chain-of-thought supervision for faster learning and generalization.

Abstract

Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the *CoT information measure* , which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error scales as , where is a measure of hypothesis class complexity, which can be much faster than standard rates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.

Paper Structure

This paper contains 29 sections, 4 theorems, 114 equations, 7 figures, 1 table.

Key Result

lemma 1

Let $\calH \subset (\calY \times \calZ)^{\calX}$ be a CoT hypothesis class. Then the CoT information $\cotinfo(\epsilon; \calH)$ satisfies the following properties:

Figures (7)

  • Figure 1: An illustration of standard end-to-end supervision and CoT supervision. Our theoretical framework is aimed at understanding tradeoffs between end-to-end supervision and CoT supervision, and in particular, how the potentially richer information in the CoT signal can result in faster learning rates.
  • Figure 2: Illustration of the statistical advantage of CoT supervision in terms of the geometry of the CoT consistency rule with respect to end-to-end error. CoT supervision enables the construction of a tighter consistency set, when the CoT is informative (i.e.,, $\cotinfo(\epsilon; \calH) > \epsilon$), which leads to smaller end-to-end error and more sample-efficient learning.
  • Figure 3: Numerical experiments for deterministic finite automata CoT hypothesis class.
  • Figure 4: Numerical experiments for iterated linear thresholds CoT hypothesis class.
  • Figure 5: The state transition graph of the DFA corresponding to the target hypothesis $\hstar$.
  • ...and 2 more figures

Theorems & Definitions (20)

  • definition 1: Realizable chain-of-thought PAC learning
  • definition 2: Agnostic chain-of-thought PAC learning
  • definition 3: CoT information
  • lemma 1
  • proof
  • proof
  • lemma 2: Relating CoT performance to E2E performance via CoT Information
  • proof
  • proof : Proof of \ref{['result:cotcons_cotinfo_infH']}
  • lemma 3: Relating CoT performance to E2E performance in the Agnostic Setting
  • ...and 10 more