Table of Contents
Fetching ...

Investigating Disability Representations in Text-to-Image Models

Yang Yian, Yu Fan, Liudmila Zavolokina, Sarah Ebling

TL;DR

This study investigates how people with disabilities are represented in text-to-image models, focusing on SDXL and DALL·E 3. It employs a two-experiment design: (i) CLIP-based similarity analysis comparing generic prompts to category-specific disability prompts to reveal default representations, and (ii) sentiment-focused evaluation comparing model mitigations, especially across mental disorders versus physical/sensory disabilities, using automatic and human judgments. Findings show a persistent default toward mobility impairment with stronger skew in SDXL, and contrasting sentiment patterns across models, where automatic and human evaluations diverge in assessing negativity. The work highlights the need for continuous, inclusive evaluation and responsible mitigation to avoid reinforcing stereotypes, and it advocates involving disability communities to improve the inclusivity and accuracy of generated imagery.

Abstract

Text-to-image generative models have made remarkable progress in producing high-quality visual content from textual descriptions, yet concerns remain about how they represent social groups. While characteristics like gender and race have received increasing attention, disability representations remain underexplored. This study investigates how people with disabilities are represented in AI-generated images by analyzing outputs from Stable Diffusion XL and DALL-E 3 using a structured prompt design. We analyze disability representations by comparing image similarities between generic disability prompts and prompts referring to specific disability categories. Moreover, we evaluate how mitigation strategies influence disability portrayals, with a focus on assessing affective framing through sentiment polarity analysis, combining both automatic and human evaluation. Our findings reveal persistent representational imbalances and highlight the need for continuous evaluation and refinement of generative models to foster more diverse and inclusive portrayals of disability.

Investigating Disability Representations in Text-to-Image Models

TL;DR

This study investigates how people with disabilities are represented in text-to-image models, focusing on SDXL and DALL·E 3. It employs a two-experiment design: (i) CLIP-based similarity analysis comparing generic prompts to category-specific disability prompts to reveal default representations, and (ii) sentiment-focused evaluation comparing model mitigations, especially across mental disorders versus physical/sensory disabilities, using automatic and human judgments. Findings show a persistent default toward mobility impairment with stronger skew in SDXL, and contrasting sentiment patterns across models, where automatic and human evaluations diverge in assessing negativity. The work highlights the need for continuous, inclusive evaluation and responsible mitigation to avoid reinforcing stereotypes, and it advocates involving disability communities to improve the inclusivity and accuracy of generated imagery.

Abstract

Text-to-image generative models have made remarkable progress in producing high-quality visual content from textual descriptions, yet concerns remain about how they represent social groups. While characteristics like gender and race have received increasing attention, disability representations remain underexplored. This study investigates how people with disabilities are represented in AI-generated images by analyzing outputs from Stable Diffusion XL and DALL-E 3 using a structured prompt design. We analyze disability representations by comparing image similarities between generic disability prompts and prompts referring to specific disability categories. Moreover, we evaluate how mitigation strategies influence disability portrayals, with a focus on assessing affective framing through sentiment polarity analysis, combining both automatic and human evaluation. Our findings reveal persistent representational imbalances and highlight the need for continuous evaluation and refinement of generative models to foster more diverse and inclusive portrayals of disability.
Paper Structure (29 sections, 1 equation, 9 figures, 7 tables)

This paper contains 29 sections, 1 equation, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of the evaluation process for detecting disability representation differences. Image similarity between generated images using generic and specified prompts is measured with CLIP embeddings.
  • Figure 2: Overview of the evaluation process for detecting sentiment of synthesized images from different models. The comparison is made between Stable Diffusion XL and DALL·E 3.
  • Figure 3: Sample images generated from the generic and specified prompts using Stable Diffusion XL
  • Figure 4: Sample images generated from the generic and specified prompts using DALL·E 3
  • Figure 5: Relative similarity ($\Delta$) of the generic prompt across disability categories. Bars show mean $\Delta$ values with 95% bootstrap confidence intervals. Positive values indicate stronger alignment with the category compared to others, negative values indicate relative underrepresentation.
  • ...and 4 more figures