Table of Contents
Fetching ...

Exosense: A Vision-Based Scene Understanding System For Exoskeletons

Jianeng Wang, Matias Mattamala, Christina Kassab, Guillaume Burger, Fabio Elnecave, Lintong Zhang, Marine Petriaux, Maurice Fallon

TL;DR

The paper tackles the challenge of enabling long-term, indoor operation for self-balancing exoskeletons by introducing Exosense, a vision-centric scene understanding system. It combines a leg-mounted multi-sensor unit with a navigation stack that builds a reusable environment representation featuring terrain geometry, semantics, and traversability, enabling localization and future navigation. The approach unifies visual-inertial odometry (via OpenVINS), semantic pose-graph SLAM (via LEXIS with CLIP-based room labels), and a two-layer terrain mapping strategy with room-level fusion, plus a step-height-based traversability analysis. Empirical results show an odometry drift of about 4 cm per meter, terrain reconstruction errors under 1 cm on average, successful multi-floor mapping and relocalization, and favorable traversability assessments, demonstrating practical viability for indoor exoskeleton deployment and long-term operation.

Abstract

Self-balancing exoskeletons are a key enabling technology for individuals with mobility impairments. While the current challenges focus on human-compliant hardware and control, unlocking their use for daily activities requires a scene perception system. In this work, we present Exosense, a vision-centric scene understanding system for self-balancing exoskeletons. We introduce a multi-sensor visual-inertial mapping device as well as a navigation stack for state estimation, terrain mapping and long-term operation. We tested Exosense attached to both a human leg and Wandercraft's Personal Exoskeleton in real-world indoor scenarios. This enabled us to test the system during typical periodic walking gaits, as well as future uses in multi-story environments. We demonstrate that Exosense can achieve an odometry drift of about 4 cm per meter traveled, and construct terrain maps under 1 cm average reconstruction error. It can also work in a visual localization mode in a previously mapped environment, providing a step towards long-term operation of exoskeletons.

Exosense: A Vision-Based Scene Understanding System For Exoskeletons

TL;DR

The paper tackles the challenge of enabling long-term, indoor operation for self-balancing exoskeletons by introducing Exosense, a vision-centric scene understanding system. It combines a leg-mounted multi-sensor unit with a navigation stack that builds a reusable environment representation featuring terrain geometry, semantics, and traversability, enabling localization and future navigation. The approach unifies visual-inertial odometry (via OpenVINS), semantic pose-graph SLAM (via LEXIS with CLIP-based room labels), and a two-layer terrain mapping strategy with room-level fusion, plus a step-height-based traversability analysis. Empirical results show an odometry drift of about 4 cm per meter, terrain reconstruction errors under 1 cm on average, successful multi-floor mapping and relocalization, and favorable traversability assessments, demonstrating practical viability for indoor exoskeleton deployment and long-term operation.

Abstract

Self-balancing exoskeletons are a key enabling technology for individuals with mobility impairments. While the current challenges focus on human-compliant hardware and control, unlocking their use for daily activities requires a scene perception system. In this work, we present Exosense, a vision-centric scene understanding system for self-balancing exoskeletons. We introduce a multi-sensor visual-inertial mapping device as well as a navigation stack for state estimation, terrain mapping and long-term operation. We tested Exosense attached to both a human leg and Wandercraft's Personal Exoskeleton in real-world indoor scenarios. This enabled us to test the system during typical periodic walking gaits, as well as future uses in multi-story environments. We demonstrate that Exosense can achieve an odometry drift of about 4 cm per meter traveled, and construct terrain maps under 1 cm average reconstruction error. It can also work in a visual localization mode in a previously mapped environment, providing a step towards long-term operation of exoskeletons.
Paper Structure (19 sections, 3 equations, 8 figures, 3 tables)

This paper contains 19 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Exosense scene understanding system. The inputs are RGB images and 3D point clouds from the multi-sensor unit. Different modules provide odometry and sensor-rate terrain maps, which are integrated into a semantic pose graph and processed with a terrain reconstruction module. The final scene representation (bottom right) contains terrain geometry, semantics, and traversability as well as visual localization information to aid long-term operation.
  • Figure 2: Sample of linear acceleration and angular rotation rates measured by Exosense in exoskeleton (top) and human-leg-mounted (bottom) configurations. Both modes have a similar gait duration. The highlighted spikes (pink) occur during foot strikes.
  • Figure 3: Exp B -- CPU usage over time for the evaluated odometry algorithms over five runs. The darker lines show the mean, while the shaded areas are the 95% confidence interval. OpenVINS is significantly more lightweight---using about half the computation.
  • Figure 4: Exp C -- Multi-story mapping of sequence H4 in the Home dataset. Exosense generated a globally consistent multi-floor terrain map. Each room is a single individual submap colored by its type.
  • Figure 5: Exp D -- Qualitative mapping result after submap merging of the Exo sequence, colored by the elevation. Staircases and part of the ground areas are shown in detail both before and after applying submap merging (bottom). The median-based merging method removed outliers in the terrain submap while preserving the sharp features and edges of terrain geometry.
  • ...and 3 more figures