Tell Me Why: Incentivizing Explanations
Siddarth Srinivasan, Ezra Karger, Michiel Bakker, Yiling Chen
TL;DR
This work formalizes rationales as a mechanism to reveal informational overlaps among experts, enabling faster and more efficient Bayesian aggregation than belief-only reporting. It introduces a deliberation mechanism with a supervisor and experts that incentivizes truthful reporting of beliefs and rationales through proper scoring rules and a commitment to ignore nonrationale reports, yielding a perfect Bayesian equilibrium. The model shows how rationales refine the filtration of information, isolating new, shared, and old information components to improve collective forecasts. The framework has broad implications for judgmental forecasting, AI alignment, and scalable oversight, with extensions to markets, self-resolving variants, and LLM-enabled architectures.
Abstract
Common sense suggests that when individuals explain why they believe something, we can arrive at more accurate conclusions than when they simply state what they believe. Yet, there is no known mechanism that provides incentives to elicit explanations for beliefs from agents. This likely stems from the fact that standard Bayesian models make assumptions (like conditional independence of signals) that preempt the need for explanations, in order to show efficient information aggregation. A natural justification for the value of explanations is that agents' beliefs tend to be drawn from overlapping sources of information, so agents' belief reports do not reveal all that needs to be known. Indeed, this work argues that rationales-explanations of an agent's private information-lead to more efficient aggregation by allowing agents to efficiently identify what information they share and what information is new. Building on this model of rationales, we present a novel 'deliberation mechanism' to elicit rationales from agents in which truthful reporting of beliefs and rationales is a perfect Bayesian equilibrium.
