Table of Contents
Fetching ...

Fairness Auditing with Multi-Agent Collaboration

Martijn de Vos, Akash Dhasade, Jade Garcia Bourrée, Anne-Marie Kermarrec, Erwan Le Merrer, Benoit Rottembourg, Gilles Tredan

TL;DR

The paper explores fairness auditing with multiple collaborating agents, each auditing a protected attribute under a fixed query budget. It analyzes how sampling methods (uniform, stratified, Neyman) interact with collaboration modes (no collaboration, a-posteriori, a-priori) to affect the accuracy of demographic parity estimates. Theoretical results show that collaboration generally improves audit accuracy, but extensive a-priori coordination with stratified sampling can be detrimental as the number of agents grows; under a-posteriori collaboration, advanced sampling methods offer diminishing returns with more agents, making uniform sampling effectively optimal at scale. Empirical validation on Folktables, German Credit, and ProPublica datasets confirms the theory, demonstrating substantial DP error reductions via collaboration and clarifying when each strategy is advantageous. The work provides practical guidance for coordinating fairness audits in real-world ML systems and outlines avenues for extending to intersectional and active-sampling settings.

Abstract

Existing work in fairness auditing assumes that each audit is performed independently. In this paper, we consider multiple agents working together, each auditing the same platform for different tasks. Agents have two levers: their collaboration strategy, with or without coordination beforehand, and their strategy for sampling appropriate data points. We theoretically compare the interplay of these levers. Our main findings are that (i) collaboration is generally beneficial for accurate audits, (ii) basic sampling methods often prove to be effective, and (iii) counter-intuitively, extensive coordination on queries often deteriorates audits accuracy as the number of agents increases. Experiments on three large datasets confirm our theoretical results. Our findings motivate collaboration during fairness audits of platforms that use ML models for decision-making.

Fairness Auditing with Multi-Agent Collaboration

TL;DR

The paper explores fairness auditing with multiple collaborating agents, each auditing a protected attribute under a fixed query budget. It analyzes how sampling methods (uniform, stratified, Neyman) interact with collaboration modes (no collaboration, a-posteriori, a-priori) to affect the accuracy of demographic parity estimates. Theoretical results show that collaboration generally improves audit accuracy, but extensive a-priori coordination with stratified sampling can be detrimental as the number of agents grows; under a-posteriori collaboration, advanced sampling methods offer diminishing returns with more agents, making uniform sampling effectively optimal at scale. Empirical validation on Folktables, German Credit, and ProPublica datasets confirms the theory, demonstrating substantial DP error reductions via collaboration and clarifying when each strategy is advantageous. The work provides practical guidance for coordinating fairness audits in real-world ML systems and outlines avenues for extending to intersectional and active-sampling settings.

Abstract

Existing work in fairness auditing assumes that each audit is performed independently. In this paper, we consider multiple agents working together, each auditing the same platform for different tasks. Agents have two levers: their collaboration strategy, with or without coordination beforehand, and their strategy for sampling appropriate data points. We theoretically compare the interplay of these levers. Our main findings are that (i) collaboration is generally beneficial for accurate audits, (ii) basic sampling methods often prove to be effective, and (iii) counter-intuitively, extensive coordination on queries often deteriorates audits accuracy as the number of agents increases. Experiments on three large datasets confirm our theoretical results. Our findings motivate collaboration during fairness audits of platforms that use ML models for decision-making.
Paper Structure (46 sections, 5 theorems, 19 equations, 7 figures, 7 tables)

This paper contains 46 sections, 5 theorems, 19 equations, 7 figures, 7 tables.

Key Result

Theorem 4.1

Except for stratified sampling under a-priori collaboration, a-posteriori and a-priori collaboration leads to more accurate results. Apart from one situation (see Theorem th3), collaboration is always beneficial and is an effective approach to increase the accuracy of fairness audits, i.e.$Var(\hat{

Figures (7)

  • Figure 1: Possible collaboration strategies of an auditor with her two agents: no collaboration (left, baseline), a-posteriori collaboration where agents share queries and responses (middle), and a-priori collaboration where agents also coordinate on queries to be sent (right).
  • Figure 2: The relative size of the largest stratum for all possible $m$ auditor configurations and three datasets. The red curve is $y=\frac{1}{2x}$.
  • Figure 3: 2-agent collaboration with stratified sampling. The budget ranges are relative to the size of the dataset being studied. We observe that collaboration (a-posteriori and a-priori) can significantly improve DP error. This is in line with \ref{['th1']}.
  • Figure 4: Different sampling methods with a-posteriori collaboration. The more agents collaborate, the more all sampling methods tend to converge. This is in line with \ref{['th2']}.
  • Figure 5: Different collaborative strategies with stratified sampling. We observe that as more agents collaborate, the a-priori strategy can be disadvantageous. This is in line with \ref{['th3']}.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 2.1
  • Definition 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem B.1
  • Theorem C.1