Argumentative Debates for Transparent Bias Detection [Technical Report]
Hamed Ayoobi, Nico Potyka, Anna Rapberger, Francesca Toni
TL;DR
ABIDE addresses the need for transparent bias detection in AI by combining neighbourhood-based local fairness with Quantitative Bipolar Argumentation Frameworks and gradual semantics. It formalizes a neighbourhood notion of bias, defines argument schemes with critical questions, and integrates local evidence across neighbourhoods into a global bias assessment for a protected feature value $X_p=g$. The approach yields interpretable debate-based explanations and demonstrates superior detection performance over a previous argumentative baseline across synthetic, real-world, and LLM settings. This framework enables explainable, modular bias reasoning suitable for human-agent and multi-agent interactions with potential for broad adoption in fairness-focused AI systems.
Abstract
As the use of AI in society grows, addressing emerging biases is essential to prevent systematic discrimination. Several bias detection methods have been proposed, but, with few exceptions, these tend to ignore transparency. Instead, interpretability and explainability are core requirements for algorithmic fairness, even more so than for other algorithmic solutions, given the human-oriented nature of fairness. We present ABIDE (Argumentative BIas detection by DEbate), a novel framework that structures bias detection transparently as debate, guided by an underlying argument graph as understood in (formal and computational) argumentation. The arguments are about the success chances of groups in local neighbourhoods and the significance of these neighbourhoods. We evaluate ABIDE experimentally and demonstrate its strengths in performance against an argumentative baseline.
