Table of Contents
Fetching ...

Explanatory Debiasing: Involving Domain Experts in the Data Generation Process to Mitigate Representation Bias in AI Systems

Aditya Bhattacharya, Simone Stumpf, Robin De Croon, Katrien Verbert

TL;DR

This paper tackles representation bias in AI by introducing generic design guidelines for involving domain experts in the data generation and augmentation process. It demonstrates the guidelines through a healthcare-focused prototype and a mixed-methods study with 35 healthcare professionals, showing reduced representation bias without sacrificing model accuracy and increased expert trust. The work contributes a structured, evidence-based framework (pre-/during-/post-augmentation) and actionable UI and process guidelines, plus open-source artifacts for replication. It highlights the complementary role of domain experts to AI experts in debiasing, with implications for more reliable and fair AI systems in high-stakes domains.

Abstract

Representation bias is one of the most common types of biases in artificial intelligence (AI) systems, causing AI models to perform poorly on underrepresented data segments. Although AI practitioners use various methods to reduce representation bias, their effectiveness is often constrained by insufficient domain knowledge in the debiasing process. To address this gap, this paper introduces a set of generic design guidelines for effectively involving domain experts in representation debiasing. We instantiated our proposed guidelines in a healthcare-focused application and evaluated them through a comprehensive mixed-methods user study with 35 healthcare experts. Our findings show that involving domain experts can reduce representation bias without compromising model accuracy. Based on our findings, we also offer recommendations for developers to build robust debiasing systems guided by our generic design guidelines, ensuring more effective inclusion of domain experts in the debiasing process.

Explanatory Debiasing: Involving Domain Experts in the Data Generation Process to Mitigate Representation Bias in AI Systems

TL;DR

This paper tackles representation bias in AI by introducing generic design guidelines for involving domain experts in the data generation and augmentation process. It demonstrates the guidelines through a healthcare-focused prototype and a mixed-methods study with 35 healthcare professionals, showing reduced representation bias without sacrificing model accuracy and increased expert trust. The work contributes a structured, evidence-based framework (pre-/during-/post-augmentation) and actionable UI and process guidelines, plus open-source artifacts for replication. It highlights the complementary role of domain experts to AI experts in debiasing, with implications for more reliable and fair AI systems in high-stakes domains.

Abstract

Representation bias is one of the most common types of biases in artificial intelligence (AI) systems, causing AI models to perform poorly on underrepresented data segments. Although AI practitioners use various methods to reduce representation bias, their effectiveness is often constrained by insufficient domain knowledge in the debiasing process. To address this gap, this paper introduces a set of generic design guidelines for effectively involving domain experts in representation debiasing. We instantiated our proposed guidelines in a healthcare-focused application and evaluated them through a comprehensive mixed-methods user study with 35 healthcare experts. Our findings show that involving domain experts can reduce representation bias without compromising model accuracy. Based on our findings, we also offer recommendations for developers to build robust debiasing systems guided by our generic design guidelines, ensuring more effective inclusion of domain experts in the debiasing process.
Paper Structure (34 sections, 10 figures, 6 tables)

This paper contains 34 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Screenshot of our debiasing application showing the following UI components described in \ref{['subsec_UI_components']}: (A) System Overview (B) Data Explorer (C) Data Quality Overview (D) Augmentation Controller (E) Generated Data Controller.
  • Figure 2: Screenshot of the User-Interaction Bias Awareness component from \ref{['subsec_UI_components']}., which issues a warning when users try to retrain the system with generated data.
  • Figure 3: Diagram illustrating the flow of our mixed-methods user study.
  • Figure 4: Plot showing post-debiasing accuracy scores for all participants and for the different user groups. This plot presents the participant scores compared to the default model score and the naive automated approach scores.
  • Figure 5: Plot illustrating the average percentage of modifications made to the generated data across various predictor variable types. The comparison is between participants who successfully improved model accuracy after debiasing and those who could not improve the default accuracy level.
  • ...and 5 more figures