Table of Contents
Fetching ...

Aequitas: A Bias and Fairness Audit Toolkit

Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, Rayid Ghani

TL;DR

The paper addresses the need for systematic bias and fairness auditing in AI-enabled policy applications. It introduces Aequitas, an open-source toolkit that integrates into the ML workflow and provides metrics, a policy-oriented guidance framework, and a web interface to assist both data scientists and policymakers. Through three real-world case studies (criminal justice, public health, public safety), the authors demonstrate that although many models exhibit biases, standard auditing often reveals that existing practices are themselves more biased, and well-audited models can improve equity. The work emphasizes reproducibility and practical adoption, offering tools and guidelines to make fairness auditing a standard part of model development, deployment, and monitoring in public policy contexts.

Abstract

Recent work has raised concerns on the risk of unintended bias in AI systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric/definition should be used and there are very few available resources to operationalize them. Therefore, despite recent awareness, auditing for bias and fairness when developing and deploying AI systems is not yet a standard practice. We present Aequitas, an open source bias and fairness audit toolkit that is an intuitive and easy to use addition to the machine learning workflow, enabling users to seamlessly test models for several bias and fairness metrics in relation to multiple population sub-groups. Aequitas facilitates informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers.

Aequitas: A Bias and Fairness Audit Toolkit

TL;DR

The paper addresses the need for systematic bias and fairness auditing in AI-enabled policy applications. It introduces Aequitas, an open-source toolkit that integrates into the ML workflow and provides metrics, a policy-oriented guidance framework, and a web interface to assist both data scientists and policymakers. Through three real-world case studies (criminal justice, public health, public safety), the authors demonstrate that although many models exhibit biases, standard auditing often reveals that existing practices are themselves more biased, and well-audited models can improve equity. The work emphasizes reproducibility and practical adoption, offering tools and guidelines to make fairness auditing a standard part of model development, deployment, and monitoring in public policy contexts.

Abstract

Recent work has raised concerns on the risk of unintended bias in AI systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric/definition should be used and there are very few available resources to operationalize them. Therefore, despite recent awareness, auditing for bias and fairness when developing and deploying AI systems is not yet a standard practice. We present Aequitas, an open source bias and fairness audit toolkit that is an intuitive and easy to use addition to the machine learning workflow, enabling users to seamlessly test models for several bias and fairness metrics in relation to multiple population sub-groups. Aequitas facilitates informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers.

Paper Structure

This paper contains 19 sections, 3 equations, 8 figures.

Figures (8)

  • Figure 1: Aequitas in the larger context of the ML pipeline. Audits must be carried internally by data scientists before evaluation and model selection. Policymakers (or clients) must audit externally before accepting a model in production as well as perform periodic audits to detect any fairness degradation over time.
  • Figure 2: Algorithmic Decision Making timeline for Public Policy and Social Good problems.
  • Figure 3: Fairness tree helps both data scientists and policymakers to select the fairness metric(s) that are relevant to each context.
  • Figure 4: Recidivism, evaluation results per group value for selected model (Precision@150 = 0.73).
  • Figure 5: HIV project, evaluation results per group value for selected model (Precision@100 = 0.24).
  • ...and 3 more figures