Table of Contents
Fetching ...

Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation Models

Xinyao Zhuang, Jose Echevarria, Kaan Akşit

TL;DR

This work systematically evaluates color accessibility in images generated by a common pretrained diffusion model, prompted to improve accessibility across diverse categories and introduces CVDLoss, a new metric measuring differences in image gradients indicative of structural detail.

Abstract

Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficiencies (CVD) remains largely unexplored. Our work systematically evaluates color accessibility in images generated by a common pretrained diffusion model, prompted to improve accessibility across diverse categories. We quantify performance using established, off-the-shelf CVD simulation methods and introduce "CVDLoss", a new metric measuring differences in image gradients indicative of structural detail. We validate CVDLoss against a commonly used daltonization method, demonstrating its sensitivity to color accessibility modifications. Applying CVDLoss to model outputs reveals that existing diffusion models struggle to reliably respond to accessibility-focused prompts. Consequently, our study establishes CVDLoss as a valuable evaluation tool for accessibility-aware image generation and post-processing, offering insights into current generative models' limitations in addressing color accessibility.

Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation Models

TL;DR

This work systematically evaluates color accessibility in images generated by a common pretrained diffusion model, prompted to improve accessibility across diverse categories and introduces CVDLoss, a new metric measuring differences in image gradients indicative of structural detail.

Abstract

Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficiencies (CVD) remains largely unexplored. Our work systematically evaluates color accessibility in images generated by a common pretrained diffusion model, prompted to improve accessibility across diverse categories. We quantify performance using established, off-the-shelf CVD simulation methods and introduce "CVDLoss", a new metric measuring differences in image gradients indicative of structural detail. We validate CVDLoss against a commonly used daltonization method, demonstrating its sensitivity to color accessibility modifications. Applying CVDLoss to model outputs reveals that existing diffusion models struggle to reliably respond to accessibility-focused prompts. Consequently, our study establishes CVDLoss as a valuable evaluation tool for accessibility-aware image generation and post-processing, offering insights into current generative models' limitations in addressing color accessibility.
Paper Structure (8 sections, 1 equation, 4 figures)

This paper contains 8 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Qualitative illustration of CVDLoss on colorblind-aware generated images. Two representative categories are shown under Normal vision and Protanopia simulation, together with their corresponding Gradient Magnitude Maps (GMMs). The reported CVDLoss values in lower left corner quantify the discrepancy between normal and protanopia GMMs. Highlighted regions illustrate how local color gradients and edge structures alter for Protanopia.
  • Figure 2: Example images from the eight categories in our dataset.
  • Figure 3: Distribution of differences in $\log_{10} (\text{CVDLoss})$ between original images and their daltonized images for protanopia (blue) and deuteranopia (orange) across categories, capturing relative changes in structural discrepancies induced by daltonization. The vertical zero line indicates no change in CVDLoss after daltonization; positive values indicate increased distortion, so negative values are preferred.
  • Figure 4: Normalized CVDLoss values for each category, prompt, and CVD in log scale. Each subplot corresponds to a category. Values are normalized by subtracting the mean of $\log_{10}(\text{CVDLoss})$ for the standard prompt (horizontal dashed line at zero). Arrows indicate the trends of accessibility improvements or reductions at the perceptual-structural level.