Table of Contents
Fetching ...

The Argument is the Explanation: Structured Argumentation for Trust in Agents

Ege Cakar, Per Ola Kristensson

TL;DR

This work tackles the challenge of trustworthy AI by replacing opaque model explanations with verifiable structured argumentation. It introduces Bipolar Assumption-Based Argumentation (B-ABA) as the core formalism, enabling explicit attack and support relations and computable extensions that certify argument acceptability. Through an end-to-end pipeline that converts natural language into argument graphs, the approach achieves state-of-the-art results on argumentative boundary extraction ($94.44$ token-F1; $86.97$ exact F1) and strong 3-class relation classification on AMT ($0.81$ macro-F1) using ModernBERT-large, with competitive performance against much larger models. The system demonstrates a deployable multi-agent risk assessment workflow via Structured What-If Technique (SWIFT), including automatic fact-checking and a test-time feedback loop that refines arguments without retraining, aided by open-source tooling and Docker deployment. Collectively, the work provides a practical, verifiable pathway for trustworthy AI in risk assessment and related domains, addressing trust and verification challenges in multi-agent settings while delivering deployable tooling.

Abstract

Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $\sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.

The Argument is the Explanation: Structured Argumentation for Trust in Agents

TL;DR

This work tackles the challenge of trustworthy AI by replacing opaque model explanations with verifiable structured argumentation. It introduces Bipolar Assumption-Based Argumentation (B-ABA) as the core formalism, enabling explicit attack and support relations and computable extensions that certify argument acceptability. Through an end-to-end pipeline that converts natural language into argument graphs, the approach achieves state-of-the-art results on argumentative boundary extraction ( token-F1; exact F1) and strong 3-class relation classification on AMT ( macro-F1) using ModernBERT-large, with competitive performance against much larger models. The system demonstrates a deployable multi-agent risk assessment workflow via Structured What-If Technique (SWIFT), including automatic fact-checking and a test-time feedback loop that refines arguments without retraining, aided by open-source tooling and Docker deployment. Collectively, the work provides a practical, verifiable pathway for trustworthy AI in risk assessment and related domains, addressing trust and verification challenges in multi-agent settings while delivering deployable tooling.

Abstract

Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and macro F1, 0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.

Paper Structure

This paper contains 20 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The complete pipeline from team generation to graph construction and verification. The SWIFT coordinator grants and receives control from experts who write to a shared working document; mining converts prose to literals, classifies relations, builds a B-ABA graph with fact nodes, and returns feedback. † All experts use a dual-agent (creative+critic) design.
  • Figure 2: Comparison of argumentation graphs from different document types. The edge colors (green=support, red=attack) reveal the distinct argumentative patterns: collaborative in technical documents vs. adversarial in debates.
  • Figure 3: The risk assessment graph after fact-checking. Fact nodes inject attack edges into the structure, creating clusters where factual contradictions are detected. Approximately 3x the baseline attack rate validates the automatic verification capability.
  • Figure 4: Sample coordinator response after receiving fact-checking feedback, demonstrating the system's ability to identify and prioritize weaknesses in the argumentation structure, as well as coordination performance..