FairLay-ML: Intuitive Debugging of Fairness in Data-Driven Social-Critical Software
Normen Yu, Luciana Carreon, Gang Tan, Saeid Tizpaz-Niari
TL;DR
The paper tackles the trustworthiness challenges of data-driven, high-stakes decision systems by introducing FairLay-ML, a GUI-driven debugging toolkit for fairness in data-driven software. It combines visualizations of data interactions, a multi-objective evolutionary search for Pareto-optimal fairness-accuracy trade-offs, explanations via LIME, and counterfactual test-case generation with manual editing to reveal and analyze discriminatory instances beyond training data. Empirical evaluation includes false positive/negative rates for counterfactual testing and a human study on the validity and proximity of counterfactuals, demonstrating the tool's utility in diagnosing and understanding fairness issues. The work contributes a practical, human-in-the-loop approach to fairness debugging with publicly available benchmarks and live demos, supporting safer deployment of socio-technical systems.
Abstract
Data-driven software solutions have significantly been used in critical domains with significant socio-economic, legal, and ethical implications. The rapid adoptions of data-driven solutions, however, pose major threats to the trustworthiness of automated decision-support software. A diminished understanding of the solution by the developer and historical/current biases in the data sets are primary challenges. To aid data-driven software developers and end-users, we present FairLay-ML, a debugging tool to test and explain the fairness implications of data-driven solutions. FairLay-ML visualizes the logic of datasets, trained models, and decisions for a given data point. In addition, it trains various models with varying fairness-accuracy trade-offs. Crucially, FairLay-ML incorporates counterfactual fairness testing that finds bugs beyond the development datasets. We conducted two studies through FairLay-ML that allowed us to measure false positives/negatives in prevalent counterfactual testing and understand the human perception of counterfactual test cases in a class survey. FairLay-ML and its benchmarks are publicly available at https://github.com/Pennswood/FairLay-ML. The live version of the tool is available at https://fairlayml-v2.streamlit.app/. We provide a video demo of the tool at https://youtu.be/wNI9UWkywVU?t=133.
