Hierarchical end-to-end autonomous navigation through few-shot waypoint detection

Amin Ghafourian; Zhongying CuiZhu; Debo Shi; Ian Chuang; Francois Charette; Rithik Sachdeva; Iman Soltani

Hierarchical end-to-end autonomous navigation through few-shot waypoint detection

Amin Ghafourian, Zhongying CuiZhu, Debo Shi, Ian Chuang, Francois Charette, Rithik Sachdeva, Iman Soltani

TL;DR

This work tackles autonomous navigation under limited localization by introducing a Description-based Navigation System (DNS) that relies on a hierarchical end-to-end architecture and few-shot landmark detection. It formulates a distribution-embedding, metric-based approach to recognize waypoint landmarks from minimal examples and triggers a low-level maneuver controller via a lookup-based high-level action. The key contributions are the two-stage DNS framework, a novel distribution-based few-shot learning method using mean/covariance embeddings and a distribution-to-distribution distance, and empirical validation on unseen indoor routes with ablation studies showing the impact of backbone pretraining and metric choice. The approach promises data-efficient, adaptable navigation with reduced reliance on precise localization, demonstrated on a small-scale vehicle in indoor environments.

Abstract

Human navigation is facilitated through the association of actions with landmarks, tapping into our ability to recognize salient features in our environment. Consequently, navigational instructions for humans can be extremely concise, such as short verbal descriptions, indicating a small memory requirement and no reliance on complex and overly accurate navigation tools. Conversely, current autonomous navigation schemes rely on accurate positioning devices and algorithms as well as extensive streams of sensory data collected from the environment. Inspired by this human capability and motivated by the associated technological gap, in this work we propose a hierarchical end-to-end meta-learning scheme that enables a mobile robot to navigate in a previously unknown environment upon presentation of only a few sample images of a set of landmarks along with their corresponding high-level navigation actions. This dramatically simplifies the wayfinding process and enables easy adoption to new environments. For few-shot waypoint detection, we implement a metric-based few-shot learning technique through distribution embedding. Waypoint detection triggers the multi-task low-level maneuver controller module to execute the corresponding high-level navigation action. We demonstrate the effectiveness of the scheme using a small-scale autonomous vehicle on novel indoor navigation tasks in several previously unseen environments.

Hierarchical end-to-end autonomous navigation through few-shot waypoint detection

TL;DR

Abstract

Paper Structure (18 sections, 4 equations, 5 figures, 3 tables)

This paper contains 18 sections, 4 equations, 5 figures, 3 tables.

Introduction
Related Works
Description-Based Navigation System
Few-shot learning for waypoint detection
Few-shot classification: problem formulation
Enhanced metric few-shot learning
DNS with distribution embeddings
Experiments
Dataset and training
Offline evaluation
Online evaluation
Results and ablation study
Offline evaluation results and the effect of backbone pretraining, metric, and image quality
Online evaluation results
Discussion and future work
...and 3 more sections

Figures (5)

Figure 1: Proposed workflow. One or a few example images are used by the mobile robot to detect predefined landmarks in the environment. Upon landmark detection, the corresponding high-level discrete navigation action (e.g. turn right, turn left, etc.) is retrieved from a lookup table and passed to a continuous maneuver controller. Continuous control executes the resulting high-level navigation action while avoiding obstacles and maintaining the vehicle on a drivable path.
Figure 2: Route teaching stage. Prior to vehicle departure, the image representations from predetermined waypoints are used to populate the corresponding memory slots along with the high-level navigation action for future reference.
Figure 3: High-level navigation module. During inference, the navigation module processes the incoming images to compare them against memory content and detect waypoints. Upon detection, the corresponding high-level navigation action is retrieved from the lookup table and issued to the maneuver control unit.
Figure 4: The low-level maneuver control module is composed of a feature extractor that processes the camera input, which is then concatenated with the discrete navigation action (e.g. turn right) and presented to the continuous controller to obtain steering and acceleration/deceleration outputs.
Figure 5: For each waypoint, a sequence of frames from an existing recording at the waypoint location (gray outline) are passed through the backbone and the distribution estimation model to obtain a corresponding class distribution for each frame (shown in gray). The distributions are then combined to form the waypoint prototype, which will be stored in memory for use during wayfinding. For a query frame sequence, the distribution is similarly obtained. The distance between the memory prototype and the real-time calculated query distribution is obtained and fed to the classifier for detection (shown in green and red respectively for a match and a mismatch). Once a waypoint is detected, the corresponding navigation action is retrieved from the LUT to condition the low-level continuous control module.

Hierarchical end-to-end autonomous navigation through few-shot waypoint detection

TL;DR

Abstract

Hierarchical end-to-end autonomous navigation through few-shot waypoint detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)