Table of Contents
Fetching ...

FairLay-ML: Intuitive Debugging of Fairness in Data-Driven Social-Critical Software

Normen Yu, Luciana Carreon, Gang Tan, Saeid Tizpaz-Niari

TL;DR

The paper tackles the trustworthiness challenges of data-driven, high-stakes decision systems by introducing FairLay-ML, a GUI-driven debugging toolkit for fairness in data-driven software. It combines visualizations of data interactions, a multi-objective evolutionary search for Pareto-optimal fairness-accuracy trade-offs, explanations via LIME, and counterfactual test-case generation with manual editing to reveal and analyze discriminatory instances beyond training data. Empirical evaluation includes false positive/negative rates for counterfactual testing and a human study on the validity and proximity of counterfactuals, demonstrating the tool's utility in diagnosing and understanding fairness issues. The work contributes a practical, human-in-the-loop approach to fairness debugging with publicly available benchmarks and live demos, supporting safer deployment of socio-technical systems.

Abstract

Data-driven software solutions have significantly been used in critical domains with significant socio-economic, legal, and ethical implications. The rapid adoptions of data-driven solutions, however, pose major threats to the trustworthiness of automated decision-support software. A diminished understanding of the solution by the developer and historical/current biases in the data sets are primary challenges. To aid data-driven software developers and end-users, we present FairLay-ML, a debugging tool to test and explain the fairness implications of data-driven solutions. FairLay-ML visualizes the logic of datasets, trained models, and decisions for a given data point. In addition, it trains various models with varying fairness-accuracy trade-offs. Crucially, FairLay-ML incorporates counterfactual fairness testing that finds bugs beyond the development datasets. We conducted two studies through FairLay-ML that allowed us to measure false positives/negatives in prevalent counterfactual testing and understand the human perception of counterfactual test cases in a class survey. FairLay-ML and its benchmarks are publicly available at https://github.com/Pennswood/FairLay-ML. The live version of the tool is available at https://fairlayml-v2.streamlit.app/. We provide a video demo of the tool at https://youtu.be/wNI9UWkywVU?t=133.

FairLay-ML: Intuitive Debugging of Fairness in Data-Driven Social-Critical Software

TL;DR

The paper tackles the trustworthiness challenges of data-driven, high-stakes decision systems by introducing FairLay-ML, a GUI-driven debugging toolkit for fairness in data-driven software. It combines visualizations of data interactions, a multi-objective evolutionary search for Pareto-optimal fairness-accuracy trade-offs, explanations via LIME, and counterfactual test-case generation with manual editing to reveal and analyze discriminatory instances beyond training data. Empirical evaluation includes false positive/negative rates for counterfactual testing and a human study on the validity and proximity of counterfactuals, demonstrating the tool's utility in diagnosing and understanding fairness issues. The work contributes a practical, human-in-the-loop approach to fairness debugging with publicly available benchmarks and live demos, supporting safer deployment of socio-technical systems.

Abstract

Data-driven software solutions have significantly been used in critical domains with significant socio-economic, legal, and ethical implications. The rapid adoptions of data-driven solutions, however, pose major threats to the trustworthiness of automated decision-support software. A diminished understanding of the solution by the developer and historical/current biases in the data sets are primary challenges. To aid data-driven software developers and end-users, we present FairLay-ML, a debugging tool to test and explain the fairness implications of data-driven solutions. FairLay-ML visualizes the logic of datasets, trained models, and decisions for a given data point. In addition, it trains various models with varying fairness-accuracy trade-offs. Crucially, FairLay-ML incorporates counterfactual fairness testing that finds bugs beyond the development datasets. We conducted two studies through FairLay-ML that allowed us to measure false positives/negatives in prevalent counterfactual testing and understand the human perception of counterfactual test cases in a class survey. FairLay-ML and its benchmarks are publicly available at https://github.com/Pennswood/FairLay-ML. The live version of the tool is available at https://fairlayml-v2.streamlit.app/. We provide a video demo of the tool at https://youtu.be/wNI9UWkywVU?t=133.
Paper Structure (7 sections, 3 figures)

This paper contains 7 sections, 3 figures.

Figures (3)

  • Figure 1: FairLay-ML Usage.
  • Figure 2: Visualization of Counterfactual Test Case Generations.
  • Figure 3: Steps in Generating Counterfactuals (CF) with Human Intuition