Table of Contents
Fetching ...

Argumentative Debates for Transparent Bias Detection [Technical Report]

Hamed Ayoobi, Nico Potyka, Anna Rapberger, Francesca Toni

TL;DR

ABIDE addresses the need for transparent bias detection in AI by combining neighbourhood-based local fairness with Quantitative Bipolar Argumentation Frameworks and gradual semantics. It formalizes a neighbourhood notion of bias, defines argument schemes with critical questions, and integrates local evidence across neighbourhoods into a global bias assessment for a protected feature value $X_p=g$. The approach yields interpretable debate-based explanations and demonstrates superior detection performance over a previous argumentative baseline across synthetic, real-world, and LLM settings. This framework enables explainable, modular bias reasoning suitable for human-agent and multi-agent interactions with potential for broad adoption in fairness-focused AI systems.

Abstract

As the use of AI in society grows, addressing emerging biases is essential to prevent systematic discrimination. Several bias detection methods have been proposed, but, with few exceptions, these tend to ignore transparency. Instead, interpretability and explainability are core requirements for algorithmic fairness, even more so than for other algorithmic solutions, given the human-oriented nature of fairness. We present ABIDE (Argumentative BIas detection by DEbate), a novel framework that structures bias detection transparently as debate, guided by an underlying argument graph as understood in (formal and computational) argumentation. The arguments are about the success chances of groups in local neighbourhoods and the significance of these neighbourhoods. We evaluate ABIDE experimentally and demonstrate its strengths in performance against an argumentative baseline.

Argumentative Debates for Transparent Bias Detection [Technical Report]

TL;DR

ABIDE addresses the need for transparent bias detection in AI by combining neighbourhood-based local fairness with Quantitative Bipolar Argumentation Frameworks and gradual semantics. It formalizes a neighbourhood notion of bias, defines argument schemes with critical questions, and integrates local evidence across neighbourhoods into a global bias assessment for a protected feature value . The approach yields interpretable debate-based explanations and demonstrates superior detection performance over a previous argumentative baseline across synthetic, real-world, and LLM settings. This framework enables explainable, modular bias reasoning suitable for human-agent and multi-agent interactions with potential for broad adoption in fairness-focused AI systems.

Abstract

As the use of AI in society grows, addressing emerging biases is essential to prevent systematic discrimination. Several bias detection methods have been proposed, but, with few exceptions, these tend to ignore transparency. Instead, interpretability and explainability are core requirements for algorithmic fairness, even more so than for other algorithmic solutions, given the human-oriented nature of fairness. We present ABIDE (Argumentative BIas detection by DEbate), a novel framework that structures bias detection transparently as debate, guided by an underlying argument graph as understood in (formal and computational) argumentation. The arguments are about the success chances of groups in local neighbourhoods and the significance of these neighbourhoods. We evaluate ABIDE experimentally and demonstrate its strengths in performance against an argumentative baseline.

Paper Structure

This paper contains 21 sections, 6 theorems, 4 figures, 4 tables.

Key Result

Proposition 1

If $\mathcal{N}_\mathbf{x}$ is the $\epsilon$-neighbourhood of a point $\mathbf{x} \in S$ with respect to the distance induced by a seminormA seminorm is a function $\| . \|: \mathcal{D} \rightarrow \mathcal{D}$ that satisfies sub-additivity (triangle inequality) and absolute homogeneity.$\| . \|$,

Figures (4)

  • Figure 1: A QBAF generated by ABIDE for the COMPAS dataset (with protected feature $X_p\!=\!race$ and protected group $g \!=$African-American) for two neighbourhoods of size $K=10, 100$ (critical question arguments with zero strength omitted). Dashed/solid edges are supports/atttacks. Green/red edges are supports/attacks (from arguments) with nonzero strength, and black edges are (from arguments) with zero strength. Edge width reflects the strength of originating arguments . (Strengths below/above nodes. Details in Section \ref{['sec:main']}.)
  • Figure 2: Argument scheme for neighbourhood $\mathcal{N}$. $X_p = g$ identifies the protected, potentially disadvantaged group.
  • Figure 3: Argument scheme for combining multiple neighbourhoods $\mathcal{N}_1, \ldots, \mathcal{N}_m$ .
  • Figure 4: Example of an Argumentative Debate (the debates in the yellow/blue boxes stem from the corresponding boxes in Figure \ref{['fig:arg_compas']}).

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Proposition 2
  • ...and 6 more