My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

Aimen Gaba; Zhanna Kaufman; Jason Chueng; Marie Shvakel; Kyle Wm. Hall; Yuriy Brun; Cindy Xiong Bearfield

My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

Aimen Gaba, Zhanna Kaufman, Jason Chueng, Marie Shvakel, Kyle Wm. Hall, Yuriy Brun, Cindy Xiong Bearfield

TL;DR

The paper tackles how visualization design shapes trust and perceived bias in ML by conducting three crowd-sourced trust-game experiments that vary visual format, textual annotations, and stakeholder context. It shows that bar charts and text differently modulate fairness-trust trade-offs, with gender differences and explicit bias warnings significantly affecting decisions. The authors identify seven reasoning strategies, demonstrate the generalizability (and limits) of visualization effects across styles, and provide design recommendations for fairness visualization tools. Together, these findings offer concrete guidance for building ML fairness visualizations that support diverse users in real-world decision-making. The work advances empirical knowledge on visualization design in ML fairness and points to practical implications for MLOps, visualization design, and decision support systems.

Abstract

Machine learning technology has become ubiquitous, but, unfortunately, often exhibits bias. As a consequence, disparate stakeholders need to interact with and make informed decisions about using machine learning models in everyday systems. Visualization technology can support stakeholders in understanding and evaluating trade-offs between, for example, accuracy and fairness of models. This paper aims to empirically answer "Can visualization design choices affect a stakeholder's perception of model bias, trust in a model, and willingness to adopt a model?" Through a series of controlled, crowd-sourced experiments with more than 1,500 participants, we identify a set of strategies people follow in deciding which models to trust. Our results show that men and women prioritize fairness and performance differently and that visual design choices significantly affect that prioritization. For example, women trust fairer models more often than men do, participants value fairness more when it is explained using text than as a bar chart, and being explicitly told a model is biased has a bigger impact than showing past biased performance. We test the generalizability of our results by comparing the effect of multiple textual and visual design choices and offer potential explanations of the cognitive mechanisms behind the difference in fairness perception and trust. Our research guides design considerations to support future work developing visualization systems for machine learning.

My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

TL;DR

Abstract

Paper Structure (32 sections, 6 figures)

This paper contains 32 sections, 6 figures.

Introduction
Related Work
Experiment 1: People's Trust in ML Models
Study Design
Procedure
Hypotheses and Expected Outcomes
Participants
General Analysis Approach
RQ1: Effects on Men's and Women's Trust
RQ2: Effect of Choosing For Yourself vs. a Client
RQ3: Effect of Model Performance
RQ4: Effect of Visual Representation
RQ5: Demographics and Personal Characteristics
RQ6: Reasoning Strategies
Strategies
...and 17 more sections

Figures (6)

Figure 1: Our study answers seven research questions to understand people's trust in ML models.
Figure 2: An example question using the bar chart representation.
Figure 3: Mean subset plots and logistic regression lines for bootstrapped results data. Data where participants are investing on behalf of a client is labeled "Client" (teal), and data where they are investing on their own behalf is separated by gender and labelled "Self - Men" (blue) and "Self - Wom" (orange). The X-axis represents the difference between returns to men and women by the biased model, and the Y-axis values represent the percentage of participants who chose the fair model for each bootstrapped data set. Subplots are separated by representation type (bar vs. text) along with whether the biased model returns more to the participant's gender or the other gender. The square, triangle, and diamond represent the three possible extreme behaviors that participant might exhibit: gender-aware, maximizing profit; gender-blind, maximizing profit; and maximizing fairness, respectively (recall Section \ref{['sec:exp1_expected']}).
Figure 4: The fraction of participants using each strategy who chose the fair or biased model in Experiment 1 (left) and Experiment 3 (right).
Figure 5: The results of Experiment 2, along with the style of the 8 textual, and 4 bar chart representations. The visualizations covered in orange box were used in Experiment 1. The orange dots show the percentage of women who choose the fair model, and the blue dots show the percentage of men who did so. The square, triangle, and diamond represent the three possible extreme behaviors that a participant might exhibit: gender-aware, maximizing profit; gender-blind, maximizing profit; and maximizing fairness, respectively (recall Section \ref{['sec:exp1_expected']}).
...and 1 more figures

My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

TL;DR

Abstract

My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)