Empowering Affected Individuals to Shape AI Fairness Assessments: Processes, Criteria, and Tools
Lin Luo, Satwik Ghanta, Yuri Nakao, Mathieu Chollet, Simone Stumpf
TL;DR
AI fairness assessments have traditionally been conducted by experts or regulators using predefined metrics, often neglecting the fairness notions of those affected by decisions. This study partners 18 decision subjects in a credit-rating scenario and uses an interactive prototype to ground lay fairness notions in model features and translate them into concrete, operational criteria. It documents a two-phase process (grounding notions and translating them into metrics) and reveals a diverse set of criteria across outcome and procedural fairness, including custom and combined metrics. The findings yield design implications for processes and tools to support inclusive, value-sensitive fairness assessment, demonstrating the feasibility and potential impact of stakeholder-driven fairness criteria in real-world AI systems.
Abstract
AI systems are increasingly used in high-stakes domains such as credit rating, where fairness concerns are critical. Existing fairness assessments are typically conducted by AI experts or regulators using predefined protected attributes and metrics, which often fail to capture the diversity and nuance of fairness notions held by the individuals who are affected by these systems' decisions, such as decision subjects. Recent work has therefore called for involving affected individuals in fairness assessment, yet little empirical evidence exists on how they create their own fairness criteria or what kinds of criteria they produce - knowledge that could not only inform experts' fairness evaluation and mitigation, but also guide the design of AI assessment tools. We address this gap through a qualitative user study with 18 participants in a credit rating scenario. Participants first articulated their fairness notions in their own words. Then, participants turned them into concrete quantified and operationalized fairness criteria, through an interactive prototype we designed. Our findings provide empirical evidence of the process through which people's fairness notions emerge via grounding in model features, and uncover a diverse set of individuals' custom-defined criteria for both outcome and procedural fairness. We provide design implications for processes and tools that support more inclusive and value-sensitive AI fairness assessment.
