Designing Rules for Choosing a Winner in a Debate
Alexander Heckett, Vincent Conitzer
TL;DR
This work formalizes how a central authority can design winner-selection rules in debates between two informed agents under verifiable arguments. It introduces three debate-game classes (CKDG, CKDDG, PIDDG), defines a policy-based error metric, and analyzes both computational complexity (polynomial-time evaluation vs. NP-complete policy design) and performance guarantees via probabilistic highlighting and ranking strategies. The results show that, under common-knowledge distinctions, error can decay faster than any polynomial in the action-ratio, while private-information settings yield weaker but still meaningful bounds, illuminating when randomized policies are advantageous. The findings advance understanding of mechanism design for AI safety and debate-based decision-making, and point to future work on iterated debates, asymmetries, and concise representations.
Abstract
We consider settings where an uninformed principal must hear arguments from two better-informed agents, corresponding to two possible courses of action that they argue for. The arguments are verifiable in the sense that the true state of the world restricts the arguments that can be made by the agents. Each agent simply wants to be chosen as the winner and does so strategically based on the rule set by the principal. How should the principal design the rule to choose the better action? We provide a formal framework for answering this question, exhibit some basic properties of it, study the computational problems of evaluating and optimizing the principal's policy, and provide key error bounds.
