Table of Contents
Fetching ...

"I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf

TL;DR

It is revealed that stakeholders' fairness decisions are more complex than typical AI expert practices: they considered features far beyond legally protected features, tailored metrics for specific contexts, set diverse yet stricter fairness thresholds, and even preferred designing customized fairness.

Abstract

Assessing fairness in artificial intelligence (AI) typically involves AI experts who select protected features, fairness metrics, and set fairness thresholds to assess outcome fairness. However, little is known about how stakeholders, particularly those affected by AI outcomes but lacking AI expertise, assess fairness. To address this gap, we conducted a qualitative study with 26 stakeholders without AI expertise, representing potential decision subjects in a credit rating scenario, to examine how they assess fairness when placed in the role of deciding on features with priority, metrics, and thresholds. We reveal that stakeholders' fairness decisions are more complex than typical AI expert practices: they considered features far beyond legally protected features, tailored metrics for specific contexts, set diverse yet stricter fairness thresholds, and even preferred designing customized fairness. Our results extend the understanding of how stakeholders can meaningfully contribute to AI fairness governance and mitigation, underscoring the importance of incorporating stakeholders' nuanced fairness judgments.

"I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

TL;DR

It is revealed that stakeholders' fairness decisions are more complex than typical AI expert practices: they considered features far beyond legally protected features, tailored metrics for specific contexts, set diverse yet stricter fairness thresholds, and even preferred designing customized fairness.

Abstract

Assessing fairness in artificial intelligence (AI) typically involves AI experts who select protected features, fairness metrics, and set fairness thresholds to assess outcome fairness. However, little is known about how stakeholders, particularly those affected by AI outcomes but lacking AI expertise, assess fairness. To address this gap, we conducted a qualitative study with 26 stakeholders without AI expertise, representing potential decision subjects in a credit rating scenario, to examine how they assess fairness when placed in the role of deciding on features with priority, metrics, and thresholds. We reveal that stakeholders' fairness decisions are more complex than typical AI expert practices: they considered features far beyond legally protected features, tailored metrics for specific contexts, set diverse yet stricter fairness thresholds, and even preferred designing customized fairness. Our results extend the understanding of how stakeholders can meaningfully contribute to AI fairness governance and mitigation, underscoring the importance of incorporating stakeholders' nuanced fairness judgments.

Paper Structure

This paper contains 51 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Prototype System: Dashboard (also, Task 1).Component A presents the test dataset in a tabular format, where each row represents one case (i.e., applicant) of the test dataset, displaying each applicant's feature values, the ground-truth label, and the AI's predicted outcome. Component B provides AI model explanations, performance, and information about protected features. Component C contains four sub-components: Causal Graph: Training Data and Causal Graph: Test Data, which visualize feature relationships in the training and test sets, allowing participants to click on a feature to highlight its connected edges; Feature Importance highlights the AI model's key influencing features, and Data Distribution displays the feature distribution within the training set. Component D allows participants to select and rank features by using checkboxes and dragging them into their preferred order. Clicking Confirm Feature Selection & Ranking navigates to Task 2.
  • Figure 2: Prototype System: Feature-Metric Pairing Module (Task 2).Component E shows a participant viewing (Group Fairness - Equal Opportunity) for Gender. The interface provides a lay definition, a bar chart with metric results, and metric calculation. Next, it shows a transparent individual-level visualization of the metric calculation, where the test dataset--all applicants--are sub-grouped by gender and AI prediction (i.e., "Predicted Credit is Good" and "Predicted Credit is Bad"), and then further grouped by ground-truth label ("Real Credit is Good" and "Real Credit is Bad"). The gray region denotes applicants with real bad credit, and are excluded from this metric calculation. For applicants with real good credit, the metric is calculated as the proportion of Yellow (real good & predicted good) over Yellow + White (all real good).
  • Figure 3: Heatmaps: Distribution of participants' top-selected features and their associated fairness metrics.
  • Figure 4: Conditional Statistical Parity Metric Explanation
  • Figure 5: Counterfactual Fairness Metric Explanation
  • ...and 1 more figures