Table of Contents
Fetching ...

A superpersuasive autonomous policy debating system

Allen Roush, Devin Gonier, John Hines, Judah Goldfeder, Philippe Martin Wyder, Sanjay Basu, Ravid Shwartz Ziv

TL;DR

DeepDebater addresses the challenge of high-stakes, evidence-grounded AI persuasion by introducing a hierarchical, multi-agent architecture that supports end-to-end generation of competitive policy debates. It combines a large, curated evidence corpus with iterative retrieval, structured generation, and self-critique to produce coherent eight-speech rounds, including cross-examinations and judge-like decisions, delivered via TTS and talking-head visuals. The work demonstrates superior argumentative quality and competitive performance against human strategies in simulated rounds, and it provides an open-source implementation for broader research use. Together, these contributions advance research in structured argumentation, AI persuasion, and human-AI collaboration, while also highlighting governance and safety considerations for powerful persuasion systems.

Abstract

The capacity for highly complex, evidence-based, and strategically adaptive persuasion remains a formidable great challenge for artificial intelligence. Previous work, like IBM Project Debater, focused on generating persuasive speeches in simplified and shortened debate formats intended for relatively lay audiences. We introduce DeepDebater, a novel autonomous system capable of participating in and winning a full, unmodified, two-team competitive policy debate. Our system employs a hierarchical architecture of specialized multi-agent workflows, where teams of LLM-powered agents collaborate and critique one another to perform discrete argumentative tasks. Each workflow utilizes iterative retrieval, synthesis, and self-correction using a massive corpus of policy debate evidence (OpenDebateEvidence) and produces complete speech transcripts, cross-examinations, and rebuttals. We introduce a live, interactive end-to-end presentation pipeline that renders debates with AI speech and animation: transcripts are surface-realized and synthesized to audio with OpenAI TTS, and then displayed as talking-head portrait videos with EchoMimic V1. Beyond fully autonomous matches (AI vs AI), DeepDebater supports hybrid human-AI operation: human debaters can intervene at any stage, and humans can optionally serve as opponents against AI in any speech, allowing AI-human and AI-AI rounds. In preliminary evaluations against human-authored cases, DeepDebater produces qualitatively superior argumentative components and consistently wins simulated rounds as adjudicated by an independent autonomous judge. Expert human debate coaches also prefer the arguments, evidence, and cases constructed by DeepDebater. We open source all code, generated speech transcripts, audio and talking head video here: https://github.com/Hellisotherpeople/DeepDebater/tree/main

A superpersuasive autonomous policy debating system

TL;DR

DeepDebater addresses the challenge of high-stakes, evidence-grounded AI persuasion by introducing a hierarchical, multi-agent architecture that supports end-to-end generation of competitive policy debates. It combines a large, curated evidence corpus with iterative retrieval, structured generation, and self-critique to produce coherent eight-speech rounds, including cross-examinations and judge-like decisions, delivered via TTS and talking-head visuals. The work demonstrates superior argumentative quality and competitive performance against human strategies in simulated rounds, and it provides an open-source implementation for broader research use. Together, these contributions advance research in structured argumentation, AI persuasion, and human-AI collaboration, while also highlighting governance and safety considerations for powerful persuasion systems.

Abstract

The capacity for highly complex, evidence-based, and strategically adaptive persuasion remains a formidable great challenge for artificial intelligence. Previous work, like IBM Project Debater, focused on generating persuasive speeches in simplified and shortened debate formats intended for relatively lay audiences. We introduce DeepDebater, a novel autonomous system capable of participating in and winning a full, unmodified, two-team competitive policy debate. Our system employs a hierarchical architecture of specialized multi-agent workflows, where teams of LLM-powered agents collaborate and critique one another to perform discrete argumentative tasks. Each workflow utilizes iterative retrieval, synthesis, and self-correction using a massive corpus of policy debate evidence (OpenDebateEvidence) and produces complete speech transcripts, cross-examinations, and rebuttals. We introduce a live, interactive end-to-end presentation pipeline that renders debates with AI speech and animation: transcripts are surface-realized and synthesized to audio with OpenAI TTS, and then displayed as talking-head portrait videos with EchoMimic V1. Beyond fully autonomous matches (AI vs AI), DeepDebater supports hybrid human-AI operation: human debaters can intervene at any stage, and humans can optionally serve as opponents against AI in any speech, allowing AI-human and AI-AI rounds. In preliminary evaluations against human-authored cases, DeepDebater produces qualitatively superior argumentative components and consistently wins simulated rounds as adjudicated by an independent autonomous judge. Expert human debate coaches also prefer the arguments, evidence, and cases constructed by DeepDebater. We open source all code, generated speech transcripts, audio and talking head video here: https://github.com/Hellisotherpeople/DeepDebater/tree/main

Paper Structure

This paper contains 26 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Creative System Demonstration: After the audience picks a resolution, two onscreen systems—Team Affirmative (red) and Team Negative (blue)—launch specialist agents (Affirmative: Plan-text, Harms, Inherency, Advantages, Solvency; Negative: Topicality/Theory, Disadvantage, Counterplan, Kritik, On-case Rebuttal). Each team uses gpt-4-mini + OpenDebateEvidence indexed in DuckDB, with a live UI streaming the AG2 agent chats, searches, and evidence as arguments are drafted. Completed speeches are rendered on screen and are voiced through GPT-4o mini TTS text-to-speech and animated with EchoMimic V1 while the other side prepares its reply, cycling through the full debate round. Independent Judge agents (green) powered by Claude or Gemini will judge the round at the end of the speeches. Brave audience volunteers may participate and fill in as one full team, or as a teammate alongside an AI for any speech. They may also propose a new topic (triggering a new debate)
  • Figure 2: This loop of generation, structured generation, retrieval, and critical review continues for a set number of iterations or until the Reviewer agent is satisfied. Structured outputs, enforced via Pydantic Pydantic models guarantee that agent messages are machine-readable and conform to the expected format for each task.
  • Figure 3: Complete System Architectural Diagram