Exosense: A Vision-Based Scene Understanding System For Exoskeletons
Jianeng Wang, Matias Mattamala, Christina Kassab, Guillaume Burger, Fabio Elnecave, Lintong Zhang, Marine Petriaux, Maurice Fallon
TL;DR
The paper tackles the challenge of enabling long-term, indoor operation for self-balancing exoskeletons by introducing Exosense, a vision-centric scene understanding system. It combines a leg-mounted multi-sensor unit with a navigation stack that builds a reusable environment representation featuring terrain geometry, semantics, and traversability, enabling localization and future navigation. The approach unifies visual-inertial odometry (via OpenVINS), semantic pose-graph SLAM (via LEXIS with CLIP-based room labels), and a two-layer terrain mapping strategy with room-level fusion, plus a step-height-based traversability analysis. Empirical results show an odometry drift of about 4 cm per meter, terrain reconstruction errors under 1 cm on average, successful multi-floor mapping and relocalization, and favorable traversability assessments, demonstrating practical viability for indoor exoskeleton deployment and long-term operation.
Abstract
Self-balancing exoskeletons are a key enabling technology for individuals with mobility impairments. While the current challenges focus on human-compliant hardware and control, unlocking their use for daily activities requires a scene perception system. In this work, we present Exosense, a vision-centric scene understanding system for self-balancing exoskeletons. We introduce a multi-sensor visual-inertial mapping device as well as a navigation stack for state estimation, terrain mapping and long-term operation. We tested Exosense attached to both a human leg and Wandercraft's Personal Exoskeleton in real-world indoor scenarios. This enabled us to test the system during typical periodic walking gaits, as well as future uses in multi-story environments. We demonstrate that Exosense can achieve an odometry drift of about 4 cm per meter traveled, and construct terrain maps under 1 cm average reconstruction error. It can also work in a visual localization mode in a previously mapped environment, providing a step towards long-term operation of exoskeletons.
