Table of Contents
Fetching ...

"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

Atieh Taheri, Hamza El Alaoui, Patrick Carrington, Jeffrey P. Bigham

Abstract

Ableist microaggressions remain pervasive in everyday interactions, yet interventions to help people recognize them are limited. We present an experiment testing how AI-mediated dialogue influences recognition of ableism. 160 participants completed a pre-test, intervention, and a post-test across four conditions: AI nudges toward bias (Bias-Directed), inclusion (Neutral-Directed), unguided dialogue (Self-Directed), and a text-only non-dialogue (Reading). Participants rated scenarios on standardness of social experience and emotional impact; those in dialogue-based conditions also provided qualitative reflections. Quantitative results showed dialogue-based conditions produced stronger recognition than Reading, though trajectories diverged: biased nudges improved differentiation of bias from neutrality but increased overall negativity. Inclusive or no nudges remained more balanced, while Reading participants showed weaker gains and even declines. Qualitative findings revealed biased nudges were often rejected, while inclusive nudges were adopted as scaffolding. We contribute a validated vignette corpus, an AI-mediated intervention platform, and design implications highlighting trade-offs conversational systems face when integrating bias-related nudges.

"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

Abstract

Ableist microaggressions remain pervasive in everyday interactions, yet interventions to help people recognize them are limited. We present an experiment testing how AI-mediated dialogue influences recognition of ableism. 160 participants completed a pre-test, intervention, and a post-test across four conditions: AI nudges toward bias (Bias-Directed), inclusion (Neutral-Directed), unguided dialogue (Self-Directed), and a text-only non-dialogue (Reading). Participants rated scenarios on standardness of social experience and emotional impact; those in dialogue-based conditions also provided qualitative reflections. Quantitative results showed dialogue-based conditions produced stronger recognition than Reading, though trajectories diverged: biased nudges improved differentiation of bias from neutrality but increased overall negativity. Inclusive or no nudges remained more balanced, while Reading participants showed weaker gains and even declines. Qualitative findings revealed biased nudges were often rejected, while inclusive nudges were adopted as scaffolding. We contribute a validated vignette corpus, an AI-mediated intervention platform, and design implications highlighting trade-offs conversational systems face when integrating bias-related nudges.
Paper Structure (58 sections, 5 figures, 4 tables)

This paper contains 58 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: System architecture of the study platform. Participants interacted through a browser-based front end supporting avatar creation, a dialogue intervention interface, and a reading module. The Flask backend handled condition assignment, prompt pipelines, and data integration, exchanging information asynchronously with the front end via JSON. User state and conversation history were persisted to and retrieved from the database. LLM services powered the intervention: GPT-4o generated replies of the virtual character who is a person with a disability (PwD) and coaching suggestions, while DALL·E generated avatars from user-provided features.
  • Figure 2: Dialogue Interface. The system includes (A) Scenario prompt introducing the social setting, with a toggle button to expand or collapse. (B) Pre-scripted dialogues between the virtual character (e.g., Alex) and the user. (C) User responses as part of the conversation. (D) AI-generated continuation from the character. (E) Private coaching suggestion visible only to the user, offering guidance. (F) User response input box with Send button. (G) Navigation controls. (H) Termination button.
  • Figure 3: Study procedure across two sessions. On Day 1, participants completed a Demographic Questionnaire and a Pre-Test Vignette Survey (20 scenarios: 10 ableist, 10 neutral). On Day 6, they returned for the assigned intervention (three dialogue-based conditions: Bias-Directed, Neutral-Directed, or Self-Directed, presented in either a Party or Work Office setting; or a passive Reading control), followed by a Post-Interaction Reflection (dialogue conditions only) and a Post-Test Vignette Survey (20 new scenarios matched in structure to the pre-test).
  • Figure 4: Change in ratings of (A) ableist scenarios, (B) neutral scenarios, and (C) all scenarios combined for Q1 ("standard social experience") and Q2 ("emotional impact"). Bars represent mean change from pre- to post-study across the four conditions (Bias-Directed, Neutral-Directed, Self-Directed, and Reading). Error bars indicate the standard error of the mean (SEM).
  • Figure 5: Change in contrast scores (Neutral $-$ Ableist) for Q1 (Standard Social Experience) and Q2 (Emotional Impact). Higher values indicate greater differentiation between neutral and ableist scenarios. Error bars show SEM.