Table of Contents
Fetching ...

ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming

Jaylin Herskovitz, Andi Xu, Rahaf Alharbi, Anhong Guo

TL;DR

ProgramAlly tackles the problem of customizing visual access for blind users by enabling end-user programming to create domain-specific visual filters. The approach combines block-based, natural language, and programming-by-example interfaces with a generalizable representation of filtering tasks and on-device execution, validated through a study with 12 blind adults. The results demonstrate that different modalities are advantageous for different tasks and highlight design considerations, including learnability, ambiguity in language, and the need for balancing specificity with reusability. The work contributes a practical platform for DIY accessibility and provides insights into how such end-user programming could expand the range of tasks supported by assistive AI while raising questions for future automation and integration with large vision-language models.

Abstract

Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating custom filters for visual information, e.g., 'find NUMBER on BUS', leveraging three end-user programming approaches: block programming, natural language, and programming by example. To implement ProgramAlly, we designed a representation of visual filtering tasks based on scenarios encountered by blind people, and integrated a set of on-device and cloud models for generating and running these programs. In user studies with 12 blind adults, we found that participants preferred different programming modalities depending on the task, and envisioned using visual access programs to address unique accessibility challenges that are otherwise difficult with existing applications. Through ProgramAlly, we present an exploration of how blind end-users can create visual access programs to customize and control their experiences.

ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming

TL;DR

ProgramAlly tackles the problem of customizing visual access for blind users by enabling end-user programming to create domain-specific visual filters. The approach combines block-based, natural language, and programming-by-example interfaces with a generalizable representation of filtering tasks and on-device execution, validated through a study with 12 blind adults. The results demonstrate that different modalities are advantageous for different tasks and highlight design considerations, including learnability, ambiguity in language, and the need for balancing specificity with reusability. The work contributes a practical platform for DIY accessibility and provides insights into how such end-user programming could expand the range of tasks supported by assistive AI while raising questions for future automation and integration with large vision-language models.

Abstract

Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating custom filters for visual information, e.g., 'find NUMBER on BUS', leveraging three end-user programming approaches: block programming, natural language, and programming by example. To implement ProgramAlly, we designed a representation of visual filtering tasks based on scenarios encountered by blind people, and integrated a set of on-device and cloud models for generating and running these programs. In user studies with 12 blind adults, we found that participants preferred different programming modalities depending on the task, and envisioned using visual access programs to address unique accessibility challenges that are otherwise difficult with existing applications. Through ProgramAlly, we present an exploration of how blind end-users can create visual access programs to customize and control their experiences.
Paper Structure (48 sections, 4 figures, 2 tables)

This paper contains 48 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: ProgramAlly's main components: (1a) An underlying program representation, the framework for running visual filtering programs (1b). (2) A set of three, multi-modal programming interfaces to support programmers with different levels of expertise. (3) A program generation server which synthesizes filtering programs from images or natural language.
  • Figure 2: In explore mode, ProgramAlly provides a list of all items detected in the camera feed. Users then demonstrate filtering by choosing a specific item. That item is then used to fetch a specific branch from a scene hierarchy, which becomes the program.
  • Figure 3: Samples of props used in our study: (a) Grocery props for in-person testing of 'find DATE on GROCERY ITEM', (b) Mail props for in-person testing of 'find ADDRESS on PACKAGE', (c) Images used by remote participants, for testing 'find NUMBER on BUS' and 'find PERSON on BENCH'.
  • Figure 4: Participants rated each of ProgramAlly's three creation modes on a set of factors. Charts (A) and (B) demonstrate the trade offs between block and question mode: question mode was found to be easiest, but block mode was perceived to be slightly more accurate. Chart (C) demonstrates that block mode had the highest learning curve, though participants were able to create correct programs with all three modes. Each mode may be suited to different users or scenarios.