Table of Contents
Fetching ...

Designing Rules for Choosing a Winner in a Debate

Alexander Heckett, Vincent Conitzer

TL;DR

This work formalizes how a central authority can design winner-selection rules in debates between two informed agents under verifiable arguments. It introduces three debate-game classes (CKDG, CKDDG, PIDDG), defines a policy-based error metric, and analyzes both computational complexity (polynomial-time evaluation vs. NP-complete policy design) and performance guarantees via probabilistic highlighting and ranking strategies. The results show that, under common-knowledge distinctions, error can decay faster than any polynomial in the action-ratio, while private-information settings yield weaker but still meaningful bounds, illuminating when randomized policies are advantageous. The findings advance understanding of mechanism design for AI safety and debate-based decision-making, and point to future work on iterated debates, asymmetries, and concise representations.

Abstract

We consider settings where an uninformed principal must hear arguments from two better-informed agents, corresponding to two possible courses of action that they argue for. The arguments are verifiable in the sense that the true state of the world restricts the arguments that can be made by the agents. Each agent simply wants to be chosen as the winner and does so strategically based on the rule set by the principal. How should the principal design the rule to choose the better action? We provide a formal framework for answering this question, exhibit some basic properties of it, study the computational problems of evaluating and optimizing the principal's policy, and provide key error bounds.

Designing Rules for Choosing a Winner in a Debate

TL;DR

This work formalizes how a central authority can design winner-selection rules in debates between two informed agents under verifiable arguments. It introduces three debate-game classes (CKDG, CKDDG, PIDDG), defines a policy-based error metric, and analyzes both computational complexity (polynomial-time evaluation vs. NP-complete policy design) and performance guarantees via probabilistic highlighting and ranking strategies. The results show that, under common-knowledge distinctions, error can decay faster than any polynomial in the action-ratio, while private-information settings yield weaker but still meaningful bounds, illuminating when randomized policies are advantageous. The findings advance understanding of mechanism design for AI safety and debate-based decision-making, and point to future work on iterated debates, asymmetries, and concise representations.

Abstract

We consider settings where an uninformed principal must hear arguments from two better-informed agents, corresponding to two possible courses of action that they argue for. The arguments are verifiable in the sense that the true state of the world restricts the arguments that can be made by the agents. Each agent simply wants to be chosen as the winner and does so strategically based on the rule set by the principal. How should the principal design the rule to choose the better action? We provide a formal framework for answering this question, exhibit some basic properties of it, study the computational problems of evaluating and optimizing the principal's policy, and provide key error bounds.

Paper Structure

This paper contains 18 sections, 16 theorems, 14 equations, 3 tables.

Key Result

Proposition 3.5

There exist PIDDG's $B = (A, S, P, C_w, C_l)$ and $B' = (A, S, P, C_w', C_l')$ with $C_w'(s) \supseteq C_w(s)$ and $C_l'(s) \subseteq C_l(s)$ for all $s \in S$ such that the minimum possible error of a policy for $B'$ exceeds the minimum possible error of a policy for $B$.

Theorems & Definitions (52)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Definition 2.7
  • Definition 2.8
  • Remark 2.9
  • Definition 2.10
  • ...and 42 more