Table of Contents
Fetching ...

Inclusive Design of AI's Explanations: Just for Those Previously Left Out, or for Everyone?

Md Montaser Hamid, Fatima Moussaoui, Jimena Noa Guevara, Andrew Anderson, Puja Agarwal, Jonathan Dodge, Margaret Burnett

TL;DR

This study investigates whether applying GenderMag-driven inclusive design to Explainable AI (XAI) explanations yields curb-cut effects—benefits for underserved users and for all users. Using a between-subjects design with two MNK game prototypes, the authors measure mental model concepts, prediction accuracy, and inclusivity in AI explanations among AI-naïve participants. They find that inclusive fixes improve overall mental-model understanding and explanation engagement (a curb-cut effect) but do not consistently improve, and can impair, prediction accuracy (a curb-fence effect). The improvements are strongest for Abi-like problem-solvers and women, reducing gender gaps, though the work cautions about potential overreliance on explanations and calls for careful deployment of inclusive XAI design.

Abstract

Motivations: Explainable Artificial Intelligence (XAI) systems aim to improve users' understanding of AI, but XAI research shows many cases of different explanations serving some users well and being unhelpful to others. In non-AI systems, some software practitioners have used inclusive design approaches and sometimes their improvements turned out to be "curb-cut" improvements -- not only addressing the needs of underserved users, but also making the products better for everyone. So, if AI practitioners used inclusive design approaches, they too might create curb-cut improvements, i.e., better explanations for everyone. Objectives: To find out, we investigated the curb-cut effects of inclusivity-driven fixes on users' mental models of AI when using an XAI prototype. The prototype and fixes came from an AI team who had adopted an inclusive design approach (GenderMag) to improve their XAI prototype. Methods: We ran a between-subject study with 69 participants with no AI background. 34 participants used the original version of the XAI prototype and 35 used the version with the inclusivity fixes. We compared the two groups' mental model concepts scores, prediction accuracy, and inclusivity. Results: We found four main results. First, it revealed several curb-cut effects of the inclusivity fixes: overall increased engagement with explanations and better mental model concepts scores, which revealed fixes with curb-cut properties. However (second), the inclusivity fixes did not improve participants' prediction accuracy scores -- instead, it appears to have harmed them. This "curb-fence" effect (opposite of the curb-cut effect) revealed the AI explanations' double-edged impact. Third, the AI team's inclusivity fixes brought significant improvements for users whose problem-solving styles had previously been underserved. Further (fourth), the AI team's fixes reduced the gender gap by 45%.

Inclusive Design of AI's Explanations: Just for Those Previously Left Out, or for Everyone?

TL;DR

This study investigates whether applying GenderMag-driven inclusive design to Explainable AI (XAI) explanations yields curb-cut effects—benefits for underserved users and for all users. Using a between-subjects design with two MNK game prototypes, the authors measure mental model concepts, prediction accuracy, and inclusivity in AI explanations among AI-naïve participants. They find that inclusive fixes improve overall mental-model understanding and explanation engagement (a curb-cut effect) but do not consistently improve, and can impair, prediction accuracy (a curb-fence effect). The improvements are strongest for Abi-like problem-solvers and women, reducing gender gaps, though the work cautions about potential overreliance on explanations and calls for careful deployment of inclusive XAI design.

Abstract

Motivations: Explainable Artificial Intelligence (XAI) systems aim to improve users' understanding of AI, but XAI research shows many cases of different explanations serving some users well and being unhelpful to others. In non-AI systems, some software practitioners have used inclusive design approaches and sometimes their improvements turned out to be "curb-cut" improvements -- not only addressing the needs of underserved users, but also making the products better for everyone. So, if AI practitioners used inclusive design approaches, they too might create curb-cut improvements, i.e., better explanations for everyone. Objectives: To find out, we investigated the curb-cut effects of inclusivity-driven fixes on users' mental models of AI when using an XAI prototype. The prototype and fixes came from an AI team who had adopted an inclusive design approach (GenderMag) to improve their XAI prototype. Methods: We ran a between-subject study with 69 participants with no AI background. 34 participants used the original version of the XAI prototype and 35 used the version with the inclusivity fixes. We compared the two groups' mental model concepts scores, prediction accuracy, and inclusivity. Results: We found four main results. First, it revealed several curb-cut effects of the inclusivity fixes: overall increased engagement with explanations and better mental model concepts scores, which revealed fixes with curb-cut properties. However (second), the inclusivity fixes did not improve participants' prediction accuracy scores -- instead, it appears to have harmed them. This "curb-fence" effect (opposite of the curb-cut effect) revealed the AI explanations' double-edged impact. Third, the AI team's inclusivity fixes brought significant improvements for users whose problem-solving styles had previously been underserved. Further (fourth), the AI team's fixes reduced the gender gap by 45%.
Paper Structure (40 sections, 2 equations, 17 figures, 9 tables)

This paper contains 40 sections, 2 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: A sidewalk's curb cut.
  • Figure 2: Original prototype (we implemented via Svelte) during a game. Participants controlled the game progression with the "Prior Move" and the "Next Move" buttons. Three explanations to the right provided details on Agent Blue (X)'s decisions.
  • Figure 3: Post-GenderMag prototype during a game at the same state as the Original prototype as in Figure \ref{['figure:Pre_Prototype']}. Some fixes resulted in new additions such as the (A) Game History, (B) Game Log, and (C) Top 5 Moves. The Appendix's Table \ref{['tab:fixes_all']} enumerates all differences between the Post-GenderMag version and the Original version.
  • Figure 4: BestToWorst in the Original prototype. Agent Blue has made three moves so there are three series of scores. The most recent highest scoring square (blue series) is highlighted along with its past scores.
  • Figure 5: Left: ScoresThroughTime in the Original prototype. Agent Blue has played 3 moves so there are 3 columns of scores and the number under each column represents the move number. Right: OnTheBoard in the Original prototype. The square highlighted in yellow is the most recent move by Agent Blue which has a high score.
  • ...and 12 more figures