The Argument is the Explanation: Structured Argumentation for Trust in Agents
Ege Cakar, Per Ola Kristensson
TL;DR
This work tackles the challenge of trustworthy AI by replacing opaque model explanations with verifiable structured argumentation. It introduces Bipolar Assumption-Based Argumentation (B-ABA) as the core formalism, enabling explicit attack and support relations and computable extensions that certify argument acceptability. Through an end-to-end pipeline that converts natural language into argument graphs, the approach achieves state-of-the-art results on argumentative boundary extraction ($94.44$ token-F1; $86.97$ exact F1) and strong 3-class relation classification on AMT ($0.81$ macro-F1) using ModernBERT-large, with competitive performance against much larger models. The system demonstrates a deployable multi-agent risk assessment workflow via Structured What-If Technique (SWIFT), including automatic fact-checking and a test-time feedback loop that refines arguments without retraining, aided by open-source tooling and Docker deployment. Collectively, the work provides a practical, verifiable pathway for trustworthy AI in risk assessment and related domains, addressing trust and verification challenges in multi-agent settings while delivering deployable tooling.
Abstract
Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $\sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.
