Table of Contents
Fetching ...

EARN Fairness: Explaining, Asking, Reviewing, and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders

Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf

TL;DR

The paper addresses fragmentation in AI fairness by enabling non-experts to Explain, Ask, Review, and Negotiate fairness metrics among diverse stakeholders. It introduces the EARN Fairness framework and the Fairness Explainer and Explorer (FEE), a two-phase process to elicit personal preferences and reach consensus without AI expertise. A credit-rating study with 18 lay participants shows diverse metric preferences and demonstrates negotiation toward a hybrid consensus that balances subgroup and individual fairness. The work offers design recommendations and a path toward scalable, human-centered fairness practices in high-risk AI applications.

Abstract

Numerous fairness metrics have been proposed and employed by artificial intelligence (AI) experts to quantitatively measure bias and define fairness in AI models. Recognizing the need to accommodate stakeholders' diverse fairness understandings, efforts are underway to solicit their input. However, conveying AI fairness metrics to stakeholders without AI expertise, capturing their personal preferences, and seeking a collective consensus remain challenging and underexplored. To bridge this gap, we propose a new framework, EARN Fairness, which facilitates collective metric decisions among stakeholders without requiring AI expertise. The framework features an adaptable interactive system and a stakeholder-centered EARN Fairness process to Explain fairness metrics, Ask stakeholders' personal metric preferences, Review metrics collectively, and Negotiate a consensus on metric selection. To gather empirical results, we applied the framework to a credit rating scenario and conducted a user study involving 18 decision subjects without AI knowledge. We identify their personal metric preferences and their acceptable level of unfairness in individual sessions. Subsequently, we uncovered how they reached metric consensus in team sessions. Our work shows that the EARN Fairness framework enables stakeholders to express personal preferences and reach consensus, providing practical guidance for implementing human-centered AI fairness in high-risk contexts. Through this approach, we aim to harmonize fairness expectations of diverse stakeholders, fostering more equitable and inclusive AI fairness.

EARN Fairness: Explaining, Asking, Reviewing, and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders

TL;DR

The paper addresses fragmentation in AI fairness by enabling non-experts to Explain, Ask, Review, and Negotiate fairness metrics among diverse stakeholders. It introduces the EARN Fairness framework and the Fairness Explainer and Explorer (FEE), a two-phase process to elicit personal preferences and reach consensus without AI expertise. A credit-rating study with 18 lay participants shows diverse metric preferences and demonstrates negotiation toward a hybrid consensus that balances subgroup and individual fairness. The work offers design recommendations and a path toward scalable, human-centered fairness practices in high-risk AI applications.

Abstract

Numerous fairness metrics have been proposed and employed by artificial intelligence (AI) experts to quantitatively measure bias and define fairness in AI models. Recognizing the need to accommodate stakeholders' diverse fairness understandings, efforts are underway to solicit their input. However, conveying AI fairness metrics to stakeholders without AI expertise, capturing their personal preferences, and seeking a collective consensus remain challenging and underexplored. To bridge this gap, we propose a new framework, EARN Fairness, which facilitates collective metric decisions among stakeholders without requiring AI expertise. The framework features an adaptable interactive system and a stakeholder-centered EARN Fairness process to Explain fairness metrics, Ask stakeholders' personal metric preferences, Review metrics collectively, and Negotiate a consensus on metric selection. To gather empirical results, we applied the framework to a credit rating scenario and conducted a user study involving 18 decision subjects without AI knowledge. We identify their personal metric preferences and their acceptable level of unfairness in individual sessions. Subsequently, we uncovered how they reached metric consensus in team sessions. Our work shows that the EARN Fairness framework enables stakeholders to express personal preferences and reach consensus, providing practical guidance for implementing human-centered AI fairness in high-risk contexts. Through this approach, we aim to harmonize fairness expectations of diverse stakeholders, fostering more equitable and inclusive AI fairness.
Paper Structure (40 sections, 5 figures, 8 tables)

This paper contains 40 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Fairness Explainer and Explorer: UI Dashboard.
  • Figure 2: Static Fairness Exploration Component (Explanation Views for Group Fairness). This component is activated when users click "VIEW EXPLANATIONS" in the group fairness panel of the UI dashboard (Figure \ref{['fig: UI_Dashboard']}), showing detailed explanatory views for each group fairness metric. C1 shows a user selecting the "Age" protected feature to view explanations for all group fairness metrics. C2 depicts a user clicking "CHECK INDIVIDUAL INSTANCES" to examine the instance-level explanation for the Predictive Equality metric.
  • Figure 3: This is another example of instance-level explanation views for group fairness metrics related to the protected feature of "Gender". After users set the condition to "Job" and click "CHECK INDIVIDUAL INSTANCES" for Conditional Statistical Parity (in Figure \ref{['fig: Group_Explanation']} C1), they will see the corresponding instance-level explanation. Users can switch between conditions using radio buttons, such as "Credit History".
  • Figure 4: Static Fairness Exploration Component (C3:Explanation Views for Subgroup Fairness; C4 and C5:Explanation Views for Individual Fairness) & Dynamic Fairness Exploration Component (C6).
  • Figure 5: The ranking distribution of fairness metrics chosen by participants. The y-axis represents metrics, the x-axis represents three ranks (TOP-1, TOP-2, TOP-3), with color intensity indicating selection frequency. It shows that Conditional Statistical Parity, Consistency, and Counterfactual Fairness are the most highly ranked metrics.