IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

Wei Gao; Bo Ai; Joel Loo; Vinay; David Hsu

IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

Wei Gao, Bo Ai, Joel Loo, Vinay, David Hsu

TL;DR

IntentionNet addresses the challenge of scalable, robust long-range visual navigation by integrating a classical global planner with a learned low-level controller that operates under mission-intention signals. It introduces two intentions, Local Path and Environment (LPE) and Discretised Local Move (DLM), and demonstrates that DLM, in particular, provides robustness to mapping and localisation errors, enabling kilometre-scale navigation on a real robot. A concrete instantiation, Kilo-IntentionNet, uses a DECISION controller with per-behaviour memory modules to navigate through diverse indoor and outdoor environments despite noisy odometry. The work shows that combining topological planning with a robust, end-to-end learned controller yields scalable planning, improved obstacle avoidance, and strong generalisation, with practical implications for real-world long-range robotic navigation.

Abstract

This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper proposes two types of intentions, Local Path and Environment (LPE) and Discretised Local Move (DLM), and shows that DLM is robust to significant metric positioning and mapping errors. The paper also presents Kilo-IntentionNet, an instance of the IntentionNet system using the DLM intention that is deployed on a Boston Dynamics Spot robot, and which successfully navigates through complex indoor and outdoor environments over distances of up to a kilometre with only noisy odometry.

IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

TL;DR

Abstract

Paper Structure (40 sections, 8 equations, 13 figures, 3 tables)

This paper contains 40 sections, 8 equations, 13 figures, 3 tables.

Introduction
Related work
Classical navigation system architecture
Learned navigation system architecture
Long-range navigation systems
Learned controllers for visual navigation
System overview
Architecture
Intentions
Low-level controller
Steering controllers with intentions
Controller backbone
LPE-steered controller
DLM-steered controller
Improving robustness to partial observability
...and 25 more sections

Figures (13)

Figure 1: We demonstrate Kilo-IntentionNet's capability for long-range navigation on complex routes that mix diverse indoor and outdoor environments, and that cover distances of up to a kilometre. While Kilo-IntentionNet uses a learned controller, we show that it is capable of generalising to visually different environments not seen in its training data: the red segments of the path indicate novel environments which the controller was not trained on. The orange and blue stars mark the start and end of each route respectively.
Figure 2: Autonomous navigation system overview.
Figure 3: An illustrative representation of the Local Path and Environment (LPE) intention. It is a cropped section of the map, where the robot's historical path is visualized as a continuous red curve, while the planned future trajectory is delineated by a distinct blue curve.
Figure 4: Neural network controller architectures for different intention types. (a) LPE contains rich semantic information that can be extracted with CNNs. We directly concatenate LPE features and RGB features, since their information content is comparable. (b) DLM is a piece of symbolic information. Instead of concatenating it with the RGB features, we incorporate it into a switch module that conditionally selects the corresponding modes in the control predictions.
Figure 5: The neural network architecture of our DECISION controller. Every 3D volume in the figure denotes a feature map of shape (channel, width, height). The colored volumes denote the latent representation of the history, and blank volumes denote the spatial features at the current time step. Dashed lines between volumes denote operations such as convolutions. Key ideas: As features propagate through the convolutional layers, the representation becomes more abstract (visualized in bottom row), and the memory layers (#1-#3) integrate history information at multiple abstraction levels to enrich the representation. To learn multimodal behaviors, the memory modules for different modes in each memory layer are disentangled (volumes in different colors) and a symbolic signal is used to select the corresponding memory for feature propagation.
...and 8 more figures

IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

TL;DR

Abstract

IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (13)