Table of Contents
Fetching ...

Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

Chitralekha Gupta, Jing Peng, Ashwin Ram, Shreyas Sridhar, Christophe Jouffrais, Suranga Nanayakkara

Abstract

Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.

Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

Abstract

Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.

Paper Structure

This paper contains 48 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of Scene2Audio Framework.
  • Figure 2: Average Rank-order preference score ($\pm$Std Dev) of Audio types. Error bars show 95% confidence intervals. Pairwise statistical significance using Wilcoxon signed rank test is shown with a * for significantly different pairs ($p<0.05$).
  • Figure 3: Overall ratings averaged across questions for each criteria: Comprehension (higher is better), Engagement (higher is better), Immersion (higher is better), and Cognitive Load (lower is better). Error bars show 95% confidence intervals. Post-hoc pairwise statistical significance is shown with a * for significantly different pairs ($p<0.05$), and o for a weakly significant difference between a pair ($p<0.10)$.
  • Figure 4: Mobile App Screenshots (a) Take/Select a photo, (b) Audio playback page
  • Figure 5: Mode (in percentage) that : (a) provided the clearest info; (b) provided the least clear info; (c) provided the most enjoyable experience; (d) provided the least enjoyable experience; and (e) was overall preferred the most.
  • ...and 2 more figures