Table of Contents
Fetching ...

AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People

Seonghee Lee, Maho Kohga, Steve Landau, Sile O'Modhrain, Hari Subramonyam

TL;DR

AltCanvas addresses the challenge of enabling visually impaired users to author complex visual scenes by combining a tile-based constructive interface with generative AI and multimodal feedback. The approach provides precise spatial control through relative tile placement, speech prompts, and sonification, while offering AI-assisted generation and a tactile rendering pathway. Through formative and usability studies with 14 participants, the authors demonstrate that users can compose, edit, and finalize illustrations and tactile graphics with meaningful control and feedback, highlighting potential for broader accessibility and education. The work advances accessible AI-assisted creativity and suggests a path toward more inclusive design in visual content tools and AI-enabled authoring workflows.

Abstract

People with visual impairments often struggle to create content that relies heavily on visual elements, particularly when conveying spatial and structural information. Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork. On the other hand, emerging generative AI-based text-to-image tools can produce expressive illustrations from descriptions in natural language, but they lack precise control over image composition and properties. To address this gap, our work integrates generative AI with a constructive approach that provides users with enhanced control and editing capabilities. Our system, AltCanvas, features a tile-based interface enabling users to construct visual scenes incrementally, with each tile representing an object within the scene. Users can add, edit, move, and arrange objects while receiving speech and audio feedback. Once completed, the scene can be rendered as a color illustration or as a vector for tactile graphic generation. Involving 14 blind or low-vision users in design and evaluation, we found that participants effectively used the AltCanvas workflow to create illustrations.

AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People

TL;DR

AltCanvas addresses the challenge of enabling visually impaired users to author complex visual scenes by combining a tile-based constructive interface with generative AI and multimodal feedback. The approach provides precise spatial control through relative tile placement, speech prompts, and sonification, while offering AI-assisted generation and a tactile rendering pathway. Through formative and usability studies with 14 participants, the authors demonstrate that users can compose, edit, and finalize illustrations and tactile graphics with meaningful control and feedback, highlighting potential for broader accessibility and education. The work advances accessible AI-assisted creativity and suggests a path toward more inclusive design in visual content tools and AI-enabled authoring workflows.

Abstract

People with visual impairments often struggle to create content that relies heavily on visual elements, particularly when conveying spatial and structural information. Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork. On the other hand, emerging generative AI-based text-to-image tools can produce expressive illustrations from descriptions in natural language, but they lack precise control over image composition and properties. To address this gap, our work integrates generative AI with a constructive approach that provides users with enhanced control and editing capabilities. Our system, AltCanvas, features a tile-based interface enabling users to construct visual scenes incrementally, with each tile representing an object within the scene. Users can add, edit, move, and arrange objects while receiving speech and audio feedback. Once completed, the scene can be rendered as a color illustration or as a vector for tactile graphic generation. Involving 14 blind or low-vision users in design and evaluation, we found that participants effectively used the AltCanvas workflow to create illustrations.
Paper Structure (69 sections, 9 figures, 2 tables)

This paper contains 69 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The participants' previous works, including a pen and crayon drawing, the original grid layout for placing images, and an artwork of a bridge written in SVG code. From the left, P2 showed us drawings using ballpoint pens and crayola crayons on paper. P3 showed us the process of using a grid layout to create shape cutouts. On the right, P5 showed us an example artwork of the Golden Gate Bridge using SVG Code.
  • Figure 2: Main Interface (A) Region of the Tile Based Interface. (B) Region of the Canvas Interface. (C) Region of the Keyboard shortcut commands (D) System Settings with Canvas Size, Image style, and Speech Speed. The screen (D) will pop up initially for the user to set settings. Users can access the keyboard commands screen C when pressing SHIFT + K. Users can navigate through the screen (A) and (B) regions while editing the image.
  • Figure 3: Image Descriptions (A) Global Description of the canvas with multiple objects on it. The user can press the SHIFT +G command to hear the global description of the canvas. (B) Local Description: the user presses the SHIFT +I command to activate the local information. This will describe the image on the current tile to the user. (C) The Image Chat function. This command can be accessed through SHIFT +C. This feature will answer the question the user has about the image on the current tile to the user
  • Figure 4: Image Editing Operations Edit image locations with SHIFT + L and arrow keys: A thump sound indicates edge collision (A), speech notifies object collision (B), spatial sounds and coordinates describe movement (C). SHIFT + S adjusts size with variable frequency tones (D). When the size increases, the frequency increases, while as the size decreases, the frequency decreases. Tile manipulation and radar scan reveal layout and distances (E, F).
  • Figure 5: Final Image Rendering The figure illustrates the background rendering options and edge detection features in AltCanvas. Four different rendered backgrounds showcase how users can select various scenes for their final image composition. The second rendering operation illustrates the tactile rendering operations available to the user.
  • ...and 4 more figures