Table of Contents
Fetching ...

IN-Sight: Interactive Navigation through Sight

Philipp Schoch, Fan Yang, Yuntao Ma, Stefan Leutenegger, Marco Hutter, Quentin Leboutet

TL;DR

This work introduces IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles, and demonstrates the system’s real-world applicability with zero-shot sim-to-real transfer.

Abstract

Current visual navigation systems often treat the environment as static, lacking the ability to adaptively interact with obstacles. This limitation leads to navigation failure when encountering unavoidable obstructions. In response, we introduce IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles. Utilizing RGB-D observations, IN-Sight calculates traversability scores and incorporates them into a semantic map, facilitating long-range path planning in complex, maze-like environments. To precisely navigate around obstacles, IN-Sight employs a local planner, trained imperatively on a differentiable costmap using representation learning techniques. The entire framework undergoes end-to-end training within the state-of-the-art photorealistic Intel SPEAR Simulator. We validate the effectiveness of IN-Sight through extensive benchmarking in a variety of simulated scenarios and ablation studies. Moreover, we demonstrate the system's real-world applicability with zero-shot sim-to-real transfer, deploying our planner on the legged robot platform ANYmal, showcasing its practical potential for interactive navigation in real environments.

IN-Sight: Interactive Navigation through Sight

TL;DR

This work introduces IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles, and demonstrates the system’s real-world applicability with zero-shot sim-to-real transfer.

Abstract

Current visual navigation systems often treat the environment as static, lacking the ability to adaptively interact with obstacles. This limitation leads to navigation failure when encountering unavoidable obstructions. In response, we introduce IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles. Utilizing RGB-D observations, IN-Sight calculates traversability scores and incorporates them into a semantic map, facilitating long-range path planning in complex, maze-like environments. To precisely navigate around obstacles, IN-Sight employs a local planner, trained imperatively on a differentiable costmap using representation learning techniques. The entire framework undergoes end-to-end training within the state-of-the-art photorealistic Intel SPEAR Simulator. We validate the effectiveness of IN-Sight through extensive benchmarking in a variety of simulated scenarios and ablation studies. Moreover, we demonstrate the system's real-world applicability with zero-shot sim-to-real transfer, deploying our planner on the legged robot platform ANYmal, showcasing its practical potential for interactive navigation in real environments.
Paper Structure (12 sections, 7 equations, 7 figures, 1 table)

This paper contains 12 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Interactive Navigation: while traversing to the goal, the agent avoids stationary obstructions (brown) and pushes light obstacles (green) out of the way. Imperative Training: A self-supervised training methodology is used to train the agent end-to-end on a differentiable costmap. Traversability Estimation (TE): the agent learns to distinguish between traversable terrain (blue/grey), interactive objects (green) and static obstacles (brown) by solving TE as a co-task. The traversability estimates are integrated into a map for long-horizon path planning.
  • Figure 2: In each time step, RGB-D inputs ($R_\text{k}, D_\text{k}$) are provided to the planner. The perception module then produces traversability estimates $T_\text{k}$ which are continuously integrated into a map. Using this map, a global path to the goal is computed from which a subgoal $G_\text{k}$ is selected. The subgoal is fed to the local planner which computes the local path ($wp_\text{k}$).
  • Figure 3: Learned modules of the planner, separated into perception (left) and planning (right). Trapezoids denote neural networks which change the spatial size of the embeddings. The blue boxes denote sampling blocks where an embedding tensor is drawn from a parametrized distribution $q_\phi(z)$. The residuals of the loss function are denoted by red boxes.
  • Figure 4: Ideal depth image from SPEAR (left) afflicted by simulated noise (mid) vs. Kinect depth image from the SUN-RGBD dataset (right).
  • Figure 5: Left: A simulated agent (white sphere) approaches an obstacle (1, green), pushes it aside (2), and continues to the goal (3). Center: First-person traversability estimates overlaid on the RGB inputs. Right: The agent's global traversability grid map (blue: free space, green: interactive obstacles, brown: static obstacles).
  • ...and 2 more figures