Table of Contents
Fetching ...

The Relative Importance of Depth Cues and Semantic Edges for Indoor Mobility Using Simulated Prosthetic Vision in Immersive Virtual Reality

Alex Rasla, Michael Beyeler

TL;DR

The paper addresses how depth cues and semantic edges influence indoor mobility with simulated prosthetic vision in immersive VR. It deploys a neurobiologically inspired SPV model across four modes (EdgesOnly, DepthOnly, EdgesAndDepth, EdgesOrDepth) to assess obstacle avoidance and object identification, revealing depth information as the primary driver of safe navigation and highlighting mixed results for object recognition. The study contributes a VR-based platform, a systematic comparison of cue strategies, and insights into user preferences and design trade-offs for future visual neuroprostheses. This work advances understanding of how to integrate computer-vision-based scene simplification into prosthetic devices to improve real-world usability and safety.

Abstract

Visual neuroprostheses (bionic eyes) have the potential to treat degenerative eye diseases that often result in low vision or complete blindness. These devices rely on an external camera to capture the visual scene, which is then translated frame-by-frame into an electrical stimulation pattern that is sent to the implant in the eye. To highlight more meaningful information in the scene, recent studies have tested the effectiveness of deep-learning based computer vision techniques, such as depth estimation to highlight nearby obstacles (DepthOnly mode) and semantic edge detection to outline important objects in the scene (EdgesOnly mode). However, nobody has attempted to combine the two, either by presenting them together (EdgesAndDepth) or by giving the user the ability to flexibly switch between them (EdgesOrDepth). Here, we used a neurobiologically inspired model of simulated prosthetic vision (SPV) in an immersive virtual reality (VR) environment to test the relative importance of semantic edges and relative depth cues to support the ability to avoid obstacles and identify objects. We found that participants were significantly better at avoiding obstacles using depth-based cues as opposed to relying on edge information alone, and that roughly half the participants preferred the flexibility to switch between modes (EdgesOrDepth). This study highlights the relative importance of depth cues for SPV mobility and is an important first step towards a visual neuroprosthesis that uses computer vision to improve a user's scene understanding.

The Relative Importance of Depth Cues and Semantic Edges for Indoor Mobility Using Simulated Prosthetic Vision in Immersive Virtual Reality

TL;DR

The paper addresses how depth cues and semantic edges influence indoor mobility with simulated prosthetic vision in immersive VR. It deploys a neurobiologically inspired SPV model across four modes (EdgesOnly, DepthOnly, EdgesAndDepth, EdgesOrDepth) to assess obstacle avoidance and object identification, revealing depth information as the primary driver of safe navigation and highlighting mixed results for object recognition. The study contributes a VR-based platform, a systematic comparison of cue strategies, and insights into user preferences and design trade-offs for future visual neuroprostheses. This work advances understanding of how to integrate computer-vision-based scene simplification into prosthetic devices to improve real-world usability and safety.

Abstract

Visual neuroprostheses (bionic eyes) have the potential to treat degenerative eye diseases that often result in low vision or complete blindness. These devices rely on an external camera to capture the visual scene, which is then translated frame-by-frame into an electrical stimulation pattern that is sent to the implant in the eye. To highlight more meaningful information in the scene, recent studies have tested the effectiveness of deep-learning based computer vision techniques, such as depth estimation to highlight nearby obstacles (DepthOnly mode) and semantic edge detection to outline important objects in the scene (EdgesOnly mode). However, nobody has attempted to combine the two, either by presenting them together (EdgesAndDepth) or by giving the user the ability to flexibly switch between them (EdgesOrDepth). Here, we used a neurobiologically inspired model of simulated prosthetic vision (SPV) in an immersive virtual reality (VR) environment to test the relative importance of semantic edges and relative depth cues to support the ability to avoid obstacles and identify objects. We found that participants were significantly better at avoiding obstacles using depth-based cues as opposed to relying on edge information alone, and that roughly half the participants preferred the flexibility to switch between modes (EdgesOrDepth). This study highlights the relative importance of depth cues for SPV mobility and is an important first step towards a visual neuroprosthesis that uses computer vision to improve a user's scene understanding.
Paper Structure (23 sections, 7 figures)

This paper contains 23 sections, 7 figures.

Figures (7)

  • Figure 1: Room layouts. Participants started in the center along the bottom wall. Participants were instructed to walk towards the other end of the room while avoiding obstacles. At the end of the room, there was either one table with three objects on it, or three tables with one object on each. Participants had to identify the medium-sized cube located one on of the tables.
  • Figure 2: Example views of the table with different objects on it. The correct object to identify was always the medium-sized cube (indicated with a white dashed circle) among two distractor objects. The arrangement of the objects was pseudo-randomized on each trial. Users had to confirm their selection by pressing a button on the VIVE controllers.
  • Figure 3: SPV modes used for scene simplification. Top: In EdgesOnly mode, only semantic and structural edges are visualized. Middle: In DepthOnly mode, per-pixel ground-truth depth is inverted and linearly translated to grayscale level. Bottom: In EdgesAndDepth both edges and depth are visualized. A fourth mode, EdgesOrDepth, gave users the ability to toggle between EdgesOnly and DepthOnly modes by pressing a button on the VIVE controller.
  • Figure 4: Obstacle avoidance (OA) performance, measured by success rate (i.e., the number of trials with zero collisions, Panel A) and time taken (Panel C), and object selection (OS) performance, measured by accuracy (i.e., the fraction of trials where the correct object was selected, Panel B), and time taken (Panel D). The dashed line in Panel B indicates chance performance (33%). Vertical bars are the standard error of the mean (SEM). Statistical significance was determined using paired $t$-tests, corrected for multiple testing using the Holm-Sidak method (*: $p<.05$).
  • Figure 5: Birds-eye view of paths taken in the different rooms (columns) using the different scene simplification strategies (rows). For the sake of clarity, only the paths from every fourth subject are shown. Color of paths gets more saturated as time moves on. Collisions are indicated with a black + sign. Circles indicate the location of the obstacles, and rectangles are the table. The task switched from obstacle avoidance to object selection as soon as the participant crossed the (to them invisible) horizontal dashed line.
  • ...and 2 more figures