Table of Contents
Fetching ...

Tell Me Why: Incentivizing Explanations

Siddarth Srinivasan, Ezra Karger, Michiel Bakker, Yiling Chen

TL;DR

This work formalizes rationales as a mechanism to reveal informational overlaps among experts, enabling faster and more efficient Bayesian aggregation than belief-only reporting. It introduces a deliberation mechanism with a supervisor and experts that incentivizes truthful reporting of beliefs and rationales through proper scoring rules and a commitment to ignore nonrationale reports, yielding a perfect Bayesian equilibrium. The model shows how rationales refine the filtration of information, isolating new, shared, and old information components to improve collective forecasts. The framework has broad implications for judgmental forecasting, AI alignment, and scalable oversight, with extensions to markets, self-resolving variants, and LLM-enabled architectures.

Abstract

Common sense suggests that when individuals explain why they believe something, we can arrive at more accurate conclusions than when they simply state what they believe. Yet, there is no known mechanism that provides incentives to elicit explanations for beliefs from agents. This likely stems from the fact that standard Bayesian models make assumptions (like conditional independence of signals) that preempt the need for explanations, in order to show efficient information aggregation. A natural justification for the value of explanations is that agents' beliefs tend to be drawn from overlapping sources of information, so agents' belief reports do not reveal all that needs to be known. Indeed, this work argues that rationales-explanations of an agent's private information-lead to more efficient aggregation by allowing agents to efficiently identify what information they share and what information is new. Building on this model of rationales, we present a novel 'deliberation mechanism' to elicit rationales from agents in which truthful reporting of beliefs and rationales is a perfect Bayesian equilibrium.

Tell Me Why: Incentivizing Explanations

TL;DR

This work formalizes rationales as a mechanism to reveal informational overlaps among experts, enabling faster and more efficient Bayesian aggregation than belief-only reporting. It introduces a deliberation mechanism with a supervisor and experts that incentivizes truthful reporting of beliefs and rationales through proper scoring rules and a commitment to ignore nonrationale reports, yielding a perfect Bayesian equilibrium. The model shows how rationales refine the filtration of information, isolating new, shared, and old information components to improve collective forecasts. The framework has broad implications for judgmental forecasting, AI alignment, and scalable oversight, with extensions to markets, self-resolving variants, and LLM-enabled architectures.

Abstract

Common sense suggests that when individuals explain why they believe something, we can arrive at more accurate conclusions than when they simply state what they believe. Yet, there is no known mechanism that provides incentives to elicit explanations for beliefs from agents. This likely stems from the fact that standard Bayesian models make assumptions (like conditional independence of signals) that preempt the need for explanations, in order to show efficient information aggregation. A natural justification for the value of explanations is that agents' beliefs tend to be drawn from overlapping sources of information, so agents' belief reports do not reveal all that needs to be known. Indeed, this work argues that rationales-explanations of an agent's private information-lead to more efficient aggregation by allowing agents to efficiently identify what information they share and what information is new. Building on this model of rationales, we present a novel 'deliberation mechanism' to elicit rationales from agents in which truthful reporting of beliefs and rationales is a perfect Bayesian equilibrium.

Paper Structure

This paper contains 40 sections, 9 theorems, 32 equations, 5 figures.

Key Result

Lemma 1

Let $Y$ be a binary outcome with prior log odds $\lambda_\pi$ and $\Lambda$ be a vector of signals distributed as $\Lambda_t|Y=1 \sim \mathcal{N}(\mu, \Sigma)$ and $\Lambda|Y=0 \sim \mathcal{N}(-\mu, \Sigma)$. Then, the Bayesian posterior log odds that aggregates the signals is $\gamma = 2\mu^T \Sig

Figures (5)

  • Figure 1: Expected belief (probability) as a function of the number of experts given rationales (orange) and without rationales (blue) when $\rho=0.05$ and $Y=1$, for various values of $\alpha$, $\tau$. Shaded area is 1 std deviation.
  • Figure 2: Expected belief (probability) as a function of the number of experts given rationales (orange) and without rationales (blue) when $\rho=0.25$ and $Y=1$, for various values of $\alpha$, $\tau$. Shaded area is 1 std deviation.
  • Figure 3: Expected belief (probability) as a function of the number of experts given rationales (orange) and without rationales (blue) when $\rho=0.5$ and $Y=1$, for various values of $\alpha$, $\tau$. Shaded area is 1 std deviation.
  • Figure 4: Expected belief (probability) as a function of the number of experts given rationales (orange) and without rationales (blue) when $\rho=0.75$ and $Y=1$, for various values of $\alpha$, $\tau$. Shaded area is 1 std deviation.
  • Figure 5: Numerical calculation of the ex-ante expected log score for expert $t$ when experts $1, \ldots, t-1$ report truthfully and submit their rationales, for different values of $\rho$, given $\alpha, \tau$.

Theorems & Definitions (24)

  • Example 1
  • Definition 1: Rationales
  • Lemma 1: Aggregation of Correlated Gaussian Signals
  • proof
  • Theorem 1: Aggregate without rationales
  • Theorem 2: Aggregate with rationales
  • Theorem 3: Ex-ante log posterior odds is further from log prior odds with rationales
  • Definition 2: Scoring Rules
  • Definition 3: Market Scoring Rules
  • Remark 1: Log Scoring Rule
  • ...and 14 more