Table of Contents
Fetching ...

Scene Action Maps: Behavioural Maps for Navigation without Metric Information

Joel Loo, David Hsu

TL;DR

This work tackles navigation under limited metric information by introducing Scene Action Maps (SAMs), a topological, behaviour-based representation $\,\mathcal{G}=(\mathcal{V},\mathcal{E})$ that encodes navigational actions on edges between changepoint and destination nodes. It presents a learnable map-reading pipeline to extract SAMs from diverse 2D maps (hand-drawn sketches, floor-plans, satellite maps) using a node predictor and an edge predictor trained with SupCon and a differentiable Sinkhorn module, enabling generalisation across map types. Online navigation uses a planner over the SAM (via $\text{Dijkstra}$) and a SAM Localisation System (SLS) based on a Graph Localisation Network (GLN) with a changepoint detector to switch behaviours; the SLS localises on edges rather than nodes to better reflect real robot motion. Real-world experiments on a quadruped robot show that SAMs extracted from non-metric maps support effective navigation, with data augmentation enhancing robustness to noisy SAMs and edge-label mispredictions, highlighting the practical value of non-metric, affordance-based navigation in complex environments.

Abstract

Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., "follow the corridor" or "turn right", while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot. Videos and more information is available at: https://scene-action-maps.github.io.

Scene Action Maps: Behavioural Maps for Navigation without Metric Information

TL;DR

This work tackles navigation under limited metric information by introducing Scene Action Maps (SAMs), a topological, behaviour-based representation that encodes navigational actions on edges between changepoint and destination nodes. It presents a learnable map-reading pipeline to extract SAMs from diverse 2D maps (hand-drawn sketches, floor-plans, satellite maps) using a node predictor and an edge predictor trained with SupCon and a differentiable Sinkhorn module, enabling generalisation across map types. Online navigation uses a planner over the SAM (via ) and a SAM Localisation System (SLS) based on a Graph Localisation Network (GLN) with a changepoint detector to switch behaviours; the SLS localises on edges rather than nodes to better reflect real robot motion. Real-world experiments on a quadruped robot show that SAMs extracted from non-metric maps support effective navigation, with data augmentation enhancing robustness to noisy SAMs and edge-label mispredictions, highlighting the practical value of non-metric, affordance-based navigation in complex environments.

Abstract

Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., "follow the corridor" or "turn right", while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot. Videos and more information is available at: https://scene-action-maps.github.io.
Paper Structure (13 sections, 2 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 2 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: We propose a learnable map-reading system that extracts Scene Action Maps from pre-existing 2D maps, for behavioural navigation.
  • Figure 2: Overview of the online behavioural navigation system.
  • Figure 3: Applying $f_{ep}{}$ to orange marked node: 1) predicts soft assignment matrix with $\phi_{edge}$ and Sinkhorn, 2) thresholds to yield the outgoing edges.
  • Figure 4: SAMs extracted from (a) hand-drawn maps, (b) floor-plans, (c) satellite maps. The SAMs mostly capture the behaviours and environment structure accurately, apart from occasional errors (circled on maps) like confusing go-forward and turning edges, or missing edges.
  • Figure 5: Navigating our test environment is a challenge due to (left-to-right) wide open spaces, complex junctions, cluttered areas and dynamic obstacles.