Table of Contents
Fetching ...

RISE: Interactive Visual Diagnosis of Fairness in Machine Learning Models

Ray Chen, Christan Grant

TL;DR

RISE tackles the challenge of assessing fairness under domain shift where scalar metrics fail to reveal where disparities arise. It proposes an interactive visualization that sorts prediction residuals into a residual curve, linking patterns to formal fairness notions through knee detection and three indicators: $\mathcal{F}_{\text{mean}}$, $\mathcal{F}_{\text{shift}}$, and $\mathcal{F}_{\text{acc}}$. This approach enables localized disparity diagnosis, cross-environment subgroup analysis, and exposure of accuracy–fairness trade-offs that metrics miss, supporting more informed model selection and deployment. Demonstrations on the BDD100K driving dataset show that RISE can reveal localized biases even when aggregate metrics look favorable, guiding practitioners toward balanced, fairer systems. Overall, RISE provides a perception-informed, post-hoc diagnostic interface that complements existing fairness toolkits and facilitates actionable model analysis across modalities.

Abstract

Evaluating fairness under domain shift is challenging because scalar metrics often obscure exactly where and how disparities arise. We introduce \textit{RISE} (Residual Inspection through Sorted Evaluation), an interactive visualization tool that converts sorted residuals into interpretable patterns. By connecting residual curve structures to formal fairness notions, RISE enables localized disparity diagnosis, subgroup comparison across environments, and the detection of hidden fairness issues. Through post-hoc analysis, RISE exposes accuracy-fairness trade-offs that aggregate statistics miss, supporting more informed model selection.

RISE: Interactive Visual Diagnosis of Fairness in Machine Learning Models

TL;DR

RISE tackles the challenge of assessing fairness under domain shift where scalar metrics fail to reveal where disparities arise. It proposes an interactive visualization that sorts prediction residuals into a residual curve, linking patterns to formal fairness notions through knee detection and three indicators: , , and . This approach enables localized disparity diagnosis, cross-environment subgroup analysis, and exposure of accuracy–fairness trade-offs that metrics miss, supporting more informed model selection and deployment. Demonstrations on the BDD100K driving dataset show that RISE can reveal localized biases even when aggregate metrics look favorable, guiding practitioners toward balanced, fairer systems. Overall, RISE provides a perception-informed, post-hoc diagnostic interface that complements existing fairness toolkits and facilitates actionable model analysis across modalities.

Abstract

Evaluating fairness under domain shift is challenging because scalar metrics often obscure exactly where and how disparities arise. We introduce \textit{RISE} (Residual Inspection through Sorted Evaluation), an interactive visualization tool that converts sorted residuals into interpretable patterns. By connecting residual curve structures to formal fairness notions, RISE enables localized disparity diagnosis, subgroup comparison across environments, and the detection of hidden fairness issues. Through post-hoc analysis, RISE exposes accuracy-fairness trade-offs that aggregate statistics miss, supporting more informed model selection.
Paper Structure (8 sections, 4 equations, 2 figures, 2 tables)

This paper contains 8 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: RISE interface on ccMNIST lecun2002gradient (MBDG shown in robey2021model). (A) Group coloring for sensitive attributes/environments. (B) Twin knees (convex $\blacklozenge$, concave $\bigstar$) per group. (C) Median rulers: overall vs. group alignment (drives $\mathcal{F}_{\text{mean}}$). (D) Adaptive segmentation reveals local disparities. Tight, parallel segments indicate good fairness; near to the x-axis shows higher accuracy.
  • Figure 2: Visual signatures of different algorithms on the BDD100K dataset as revealed by RISE. Each plot illustrates the model's accuracy-fairness trade-off. (A) IGA: The curve is close to the x-axis (high accuracy) with minor separation between group, indicating a balanced trade-off. (B) IRM: The tightly clustered, curve far from the x-axis indicates consistent errors across groups (high fairness, $\mathcal{F}_{\text{acc}}/\mathcal{F}_{\text{shift}} \approx 0$) but poor low accuracy. (C) MBDG: While overall accuracy is highest, the distorted curve signals significant disparities hidden by metrics.