Table of Contents
Fetching ...

Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits

Anastasis Kratsios, Giulia Livieri, A. Martina Neuman

TL;DR

The paper addresses the problem of statistically evaluating reasoning probes that interrogate looped Boolean circuits with partial observability. It introduces a GCN-based probing framework where probe outputs live in the interior of the $m$-simplex and uncertainty is modeled via the Aitchison geometry, coupled with a hitting-probability metric on a strongly connected digraph derived from looped execution. The main result proves a transductive generalization bound: with $N$ observed nodes, the worst-case generalization error decays at the optimal rate $\mathcal{O}\big(\sqrt{\log(2/\delta)}/\sqrt{N}\big)$ with probability at least $1-\delta$, and this rate is independent of the graph size thanks to a one-dimensional snowflake embedding of the induced graph metric. The work also provides Lipschitz estimates for GCNs on digraphs and develops a metric-embedding-based proof strategy, offering a principled link between circuit structure and statistical efficiency under partial access.

Abstract

We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect $ν$-ary tree ($ν\ge 2$) and whose output is appended to the input and fed back iteratively for subsequent computation rounds. A reasoning probe has access to a sampled subset of internal computation nodes, possibly without covering the entire graph, and seeks to infer which $ν$-ary Boolean gate is executed at each queried node, representing uncertainty via a probability distribution over a fixed collection of $\mathtt{m}$ admissible $ν$-ary gates. This partial observability induces a generalization problem, which we analyze in a realizable, transductive setting. We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries $N$ nodes, the worst-case generalization error attains the optimal rate $\mathcal{O}(\sqrt{\log(2/δ)}/\sqrt{N})$ with probability at least $1-δ$, for $δ\in (0,1)$. Our analysis combines snowflake metric embedding techniques with tools from statistical optimal transport. A key insight is that this optimal rate is achievable independently of graph size, owing to the existence of a low-distortion one-dimensional snowflake embedding of the induced graph metric. As a consequence, our results provide a sharp characterization of how structural properties of the computational graph govern the statistical efficiency of reasoning under partial access.

Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits

TL;DR

The paper addresses the problem of statistically evaluating reasoning probes that interrogate looped Boolean circuits with partial observability. It introduces a GCN-based probing framework where probe outputs live in the interior of the -simplex and uncertainty is modeled via the Aitchison geometry, coupled with a hitting-probability metric on a strongly connected digraph derived from looped execution. The main result proves a transductive generalization bound: with observed nodes, the worst-case generalization error decays at the optimal rate with probability at least , and this rate is independent of the graph size thanks to a one-dimensional snowflake embedding of the induced graph metric. The work also provides Lipschitz estimates for GCNs on digraphs and develops a metric-embedding-based proof strategy, offering a principled link between circuit structure and statistical efficiency under partial access.

Abstract

We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect -ary tree () and whose output is appended to the input and fed back iteratively for subsequent computation rounds. A reasoning probe has access to a sampled subset of internal computation nodes, possibly without covering the entire graph, and seeks to infer which -ary Boolean gate is executed at each queried node, representing uncertainty via a probability distribution over a fixed collection of admissible -ary gates. This partial observability induces a generalization problem, which we analyze in a realizable, transductive setting. We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries nodes, the worst-case generalization error attains the optimal rate with probability at least , for . Our analysis combines snowflake metric embedding techniques with tools from statistical optimal transport. A key insight is that this optimal rate is achievable independently of graph size, owing to the existence of a low-distortion one-dimensional snowflake embedding of the induced graph metric. As a consequence, our results provide a sharp characterization of how structural properties of the computational graph govern the statistical efficiency of reasoning under partial access.
Paper Structure (22 sections, 6 theorems, 104 equations, 4 figures)

This paper contains 22 sections, 6 theorems, 104 equations, 4 figures.

Key Result

Theorem 3.1

Let $\alpha\in (0,1)$. Let $t,N\in \mathbb{N}$. For every $\delta\in (0,1)$, the following event holds with probability at least $1-\delta$

Figures (4)

  • Figure 2: Illustration of a one-dimensional embedding of a finite metric space; the objective is to keep metric distortion small.
  • Figure : (a) Looped reasoning model
  • Figure : (a) Looped reasoning model
  • Figure : (b) Strongly connected digraph$\mathcal{G}^{\mathrm{time}}/\mathbb{N}_{\ge 0}$

Theorems & Definitions (19)

  • Remark 2.1
  • Definition 2.1
  • Theorem 3.1: Main result
  • Proposition 4.1
  • Lemma A.1
  • Proposition A.1: Low distortion snowflake-embedding into $\mathbb{R}$
  • Proposition A.2
  • proof : Proof of Proposition \ref{['prop:independent embedding']}
  • proof : Proof of Lemma \ref{['lem:New_Convergence__SuperAssouad']}
  • proof : Proof of Theorem \ref{['thrm:main_result']}
  • ...and 9 more