Table of Contents
Fetching ...

Towards a Law of Iterated Expectations for Heuristic Estimators

Paul Christiano, Jacob Hilton, Andrea Lincoln, Eric Neyman, Mark Xu

Abstract

Christiano et al. (2022) define a *heuristic estimator* to be a hypothetical algorithm that estimates the values of mathematical expressions from arguments. In brief, a heuristic estimator $\mathbb{G}$ takes as input a mathematical expression $Y$ and a formal "heuristic argument" $π$, and outputs an estimate $\mathbb{G}(Y \mid π)$ of $Y$. In this work, we argue for the informal principle that a heuristic estimator ought not to be able to predict its own errors, and we explore approaches to formalizing this principle. Most simply, the principle suggests that $\mathbb{G}(Y - \mathbb{G}(Y \mid π) \mid π)$ ought to equal zero for all $Y$ and $π$. We argue that an ideal heuristic estimator ought to satisfy two stronger properties in this vein, which we term *iterated estimation* (by analogy to the law of iterated expectations) and *error orthogonality*. Although iterated estimation and error orthogonality are intuitively appealing, it can be difficult to determine whether a given heuristic estimator satisfies the properties. As an alternative approach, we explore *accuracy*: a property that (roughly) states that $\mathbb{G}$ has zero average error over a distribution of mathematical expressions. However, in the context of two estimation problems, we demonstrate barriers to creating an accurate heuristic estimator. We finish by discussing challenges and potential paths forward for finding a heuristic estimator that accords with our intuitive understanding of how such an estimator ought to behave, as well as the potential applications of heuristic estimators to understanding the behavior of neural networks.

Towards a Law of Iterated Expectations for Heuristic Estimators

Abstract

Christiano et al. (2022) define a *heuristic estimator* to be a hypothetical algorithm that estimates the values of mathematical expressions from arguments. In brief, a heuristic estimator takes as input a mathematical expression and a formal "heuristic argument" , and outputs an estimate of . In this work, we argue for the informal principle that a heuristic estimator ought not to be able to predict its own errors, and we explore approaches to formalizing this principle. Most simply, the principle suggests that ought to equal zero for all and . We argue that an ideal heuristic estimator ought to satisfy two stronger properties in this vein, which we term *iterated estimation* (by analogy to the law of iterated expectations) and *error orthogonality*. Although iterated estimation and error orthogonality are intuitively appealing, it can be difficult to determine whether a given heuristic estimator satisfies the properties. As an alternative approach, we explore *accuracy*: a property that (roughly) states that has zero average error over a distribution of mathematical expressions. However, in the context of two estimation problems, we demonstrate barriers to creating an accurate heuristic estimator. We finish by discussing challenges and potential paths forward for finding a heuristic estimator that accords with our intuitive understanding of how such an estimator ought to behave, as well as the potential applications of heuristic estimators to understanding the behavior of neural networks.
Paper Structure (35 sections, 10 theorems, 83 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 35 sections, 10 theorems, 83 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Proposition 2.3

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space with $\sigma$-sub-algebras $\mathcal{H}' \subseteq \mathcal{H} \subseteq \mathcal{F}$. Let $X, Y$ be random variables satisfying $\mathbb{E}_{} \left[X^2\right], \mathbb{E}_{} \left[Y^2\right] < \infty$. Then

Figures (1)

  • Figure 1: Let $\mathcal{Y}$ be the space of expressions of the form $2 \cdot c_1 + 3 \cdot c_2$, where $c_1, c_2 \in \mathbb{R}$, and let $\mathcal{D}$ be the distribution over $\mathcal{Y}$ obtained by selecting $c_1, c_2$ independently from $\mathcal{N}(0, 1)$. This figure classifies estimators of $Y \in \mathcal{Y}$ based on whether they are $1$-accurate, $c_1$-accurate, and self-accurate over $\mathcal{D}$.

Theorems & Definitions (43)

  • Definition 2.1
  • Example 2.2
  • Proposition 2.3: Projection law of conditional expectations, see e.g. moshayedi2022conditional
  • Definition 2.4
  • Example 2.5
  • Definition 3.1
  • Example 3.2
  • Remark 3.3
  • Proposition 3.4
  • proof
  • ...and 33 more