Table of Contents
Fetching ...

Steering Generative Models for Accessibility: EasyRead Image Generation

Nicolas Dickenmann, Yanis Merzouki, Sonia Laguna, Thy Nowak-Tran, Emanuele Palumbo, Julia E. Vogt, Gerda Binder

Abstract

EasyRead pictograms are simple, visually clear images that represent specific concepts and support comprehension for people with intellectual disabilities, low literacy, or language barriers. The large-scale production of EasyRead content has traditionally been constrained by the cost and expertise required to manually design pictograms. In contrast, automatic generation of such images could significantly reduce production time and cost, enabling broader accessibility across digital and printed materials. However, modern diffusion-based image generation models tend to produce outputs that exhibit excessive visual detail and lack stylistic stability across random seeds, limiting their suitability for clear and consistent pictogram generation. This challenge highlights the need for methods specifically tailored to accessibility-oriented visual content. In this work, we present a unified pipeline for generating EasyRead pictograms by fine-tuning a Stable Diffusion model using LoRA adapters on a curated corpus that combines augmented samples from multiple pictogram datasets. Since EasyRead pictograms lack a unified formal definition, we introduce an EasyRead score to benchmark pictogram quality and consistency. Our results demonstrate that diffusion models can be effectively steered toward producing coherent EasyRead-style images, indicating that generative models can serve as practical tools for scalable and accessible pictogram production.

Steering Generative Models for Accessibility: EasyRead Image Generation

Abstract

EasyRead pictograms are simple, visually clear images that represent specific concepts and support comprehension for people with intellectual disabilities, low literacy, or language barriers. The large-scale production of EasyRead content has traditionally been constrained by the cost and expertise required to manually design pictograms. In contrast, automatic generation of such images could significantly reduce production time and cost, enabling broader accessibility across digital and printed materials. However, modern diffusion-based image generation models tend to produce outputs that exhibit excessive visual detail and lack stylistic stability across random seeds, limiting their suitability for clear and consistent pictogram generation. This challenge highlights the need for methods specifically tailored to accessibility-oriented visual content. In this work, we present a unified pipeline for generating EasyRead pictograms by fine-tuning a Stable Diffusion model using LoRA adapters on a curated corpus that combines augmented samples from multiple pictogram datasets. Since EasyRead pictograms lack a unified formal definition, we introduce an EasyRead score to benchmark pictogram quality and consistency. Our results demonstrate that diffusion models can be effectively steered toward producing coherent EasyRead-style images, indicating that generative models can serve as practical tools for scalable and accessible pictogram production.
Paper Structure (31 sections, 1 equation, 4 figures, 8 tables)

This paper contains 31 sections, 1 equation, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Our pipeline to generate EasyRead pictograms. Training (Left): Input data is preprocessed, captioned using BLIP and augmented before fine-tuning Stable Diffusion v1.5 using Low-Rank Adaptation (LoRA). A unique instance token (sks) is used to learn the EasyRead style. Inference (Right): The model generates new pictograms by combining the learned style token (sks) with a descriptive prompt and specific color constraints to ensure stylistic consistency.
  • Figure 2: Qualitative sample of four generations of our model at different seeds with the prompt: The eifeltower in Paris; background color: white; skin color: white; hair color: black.
  • Figure 3: Four samples at different seeds generated by the closed-source Nano Banana Pro nanobanana model with the prompt: A pictogram of the skyline of New York City at night.
  • Figure 4: Distributional comparison of EasyRead metrics against the COCO 2017 baseline. The top panel illustrates the distribution of the aggregate EasyRead score, showing a distinct shift toward higher normalized scores for the EasyRead dataset (blue) compared to COCO 2017 (orange). The bottom panels break down the comparison across six constituent sub-metrics (Palette, Edge, Saliency, Contrast, Stroke, and Centering).