Table of Contents
Fetching ...

MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

Zhaohui Wang, Min Zhang, Jingran Yang, Bojie Shao, Min Zhang

TL;DR

This work tackles the challenge of assessing individual fairness in deep neural networks under model-agnostic, black-box access. It introduces MAFT, which replaces internal gradient access with a zero-order gradient estimation to guide a two-phase generation process (global then local) that generates discriminatory instances efficiently. Compared to white-box baselines like EIDIG and ADF, MAFT matches or closely approaches their effectiveness while delivering substantial gains in efficiency over other black-box methods, notably achieving up to ~14.69× effectiveness and ~32.58× efficiency improvements in certain comparisons. The approach scales to large networks and varied architectures, enabling practical fairness testing and potential retraining to mitigate biases without exposing model internals or requiring gradient access.

Abstract

Deep neural networks (DNNs) have shown powerful performance in various applications and are increasingly being used in decision-making systems. However, concerns about fairness in DNNs always persist. Some efficient white-box fairness testing methods about individual fairness have been proposed. Nevertheless, the development of black-box methods has stagnated, and the performance of existing methods is far behind that of white-box methods. In this paper, we propose a novel black-box individual fairness testing method called Model-Agnostic Fairness Testing (MAFT). By leveraging MAFT, practitioners can effectively identify and address discrimination in DL models, regardless of the specific algorithm or architecture employed. Our approach adopts lightweight procedures such as gradient estimation and attribute perturbation rather than non-trivial procedures like symbol execution, rendering it significantly more scalable and applicable than existing methods. We demonstrate that MAFT achieves the same effectiveness as state-of-the-art white-box methods whilst improving the applicability to large-scale networks. Compared to existing black-box approaches, our approach demonstrates distinguished performance in discovering fairness violations w.r.t effectiveness (approximately 14.69 times) and efficiency (approximately 32.58 times).

MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

TL;DR

This work tackles the challenge of assessing individual fairness in deep neural networks under model-agnostic, black-box access. It introduces MAFT, which replaces internal gradient access with a zero-order gradient estimation to guide a two-phase generation process (global then local) that generates discriminatory instances efficiently. Compared to white-box baselines like EIDIG and ADF, MAFT matches or closely approaches their effectiveness while delivering substantial gains in efficiency over other black-box methods, notably achieving up to ~14.69× effectiveness and ~32.58× efficiency improvements in certain comparisons. The approach scales to large networks and varied architectures, enabling practical fairness testing and potential retraining to mitigate biases without exposing model internals or requiring gradient access.

Abstract

Deep neural networks (DNNs) have shown powerful performance in various applications and are increasingly being used in decision-making systems. However, concerns about fairness in DNNs always persist. Some efficient white-box fairness testing methods about individual fairness have been proposed. Nevertheless, the development of black-box methods has stagnated, and the performance of existing methods is far behind that of white-box methods. In this paper, we propose a novel black-box individual fairness testing method called Model-Agnostic Fairness Testing (MAFT). By leveraging MAFT, practitioners can effectively identify and address discrimination in DL models, regardless of the specific algorithm or architecture employed. Our approach adopts lightweight procedures such as gradient estimation and attribute perturbation rather than non-trivial procedures like symbol execution, rendering it significantly more scalable and applicable than existing methods. We demonstrate that MAFT achieves the same effectiveness as state-of-the-art white-box methods whilst improving the applicability to large-scale networks. Compared to existing black-box approaches, our approach demonstrates distinguished performance in discovering fairness violations w.r.t effectiveness (approximately 14.69 times) and efficiency (approximately 32.58 times).
Paper Structure (30 sections, 12 equations, 6 figures, 2 tables, 4 algorithms)

This paper contains 30 sections, 12 equations, 6 figures, 2 tables, 4 algorithms.

Figures (6)

  • Figure 1: MAFT workflow to generate individual discriminatory instances inherited from EIDIG.
  • Figure 2: Two-Phase Generation Intuition
  • Figure 3: Hyperparameter: Perturbation Size Comparison
  • Figure 4: Individual Discriminatory Instance Generation Comparison
  • Figure 5: Gradient : Comprehensive Comparison
  • ...and 1 more figures

Theorems & Definitions (6)

  • Definition 1
  • Example 1
  • Definition 2
  • Definition 3
  • Example 2
  • Example 3