Scene Action Maps: Behavioural Maps for Navigation without Metric Information
Joel Loo, David Hsu
TL;DR
This work tackles navigation under limited metric information by introducing Scene Action Maps (SAMs), a topological, behaviour-based representation $\,\mathcal{G}=(\mathcal{V},\mathcal{E})$ that encodes navigational actions on edges between changepoint and destination nodes. It presents a learnable map-reading pipeline to extract SAMs from diverse 2D maps (hand-drawn sketches, floor-plans, satellite maps) using a node predictor and an edge predictor trained with SupCon and a differentiable Sinkhorn module, enabling generalisation across map types. Online navigation uses a planner over the SAM (via $\text{Dijkstra}$) and a SAM Localisation System (SLS) based on a Graph Localisation Network (GLN) with a changepoint detector to switch behaviours; the SLS localises on edges rather than nodes to better reflect real robot motion. Real-world experiments on a quadruped robot show that SAMs extracted from non-metric maps support effective navigation, with data augmentation enhancing robustness to noisy SAMs and edge-label mispredictions, highlighting the practical value of non-metric, affordance-based navigation in complex environments.
Abstract
Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., "follow the corridor" or "turn right", while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot. Videos and more information is available at: https://scene-action-maps.github.io.
