Towards Long-term Robotics in the Wild

Stephen Hausler; Ethan Griffiths; Milad Ramezani; Peyman Moghadam

Towards Long-term Robotics in the Wild

Stephen Hausler, Ethan Griffiths, Milad Ramezani, Peyman Moghadam

TL;DR

This paper addresses the scarcity of benchmarks for field robotics in natural environments and proposes WildPlaces and WildScenes as large-scale, multi-modal datasets with 6-DoF ground truth collected over 14 months in forest trails. The authors describe sensor platforms, ground-truth generation via Wildcat SLAM, and data pre-processing to enable both intra-sequence and inter-sequence tasks, including place recognition and semantic segmentation. They evaluate baseline methods for 2D/3D semantic segmentation, lidar place recognition, and multi-modality place recognition, and analyze the impact of long-term temporal shifts. The results demonstrate the dataset's utility for studying long-term robotics in wild environments and highlight future directions such as traversability estimation, depth completion, and optical flow to advance multi-modal learning.

Abstract

In this paper, we emphasise the critical importance of large-scale datasets for advancing field robotics capabilities, particularly in natural environments. While numerous datasets exist for urban and suburban settings, those tailored to natural environments are scarce. Our recent benchmarks WildPlaces and WildScenes address this gap by providing synchronised image, lidar, semantic and accurate 6-DoF pose information in forest-type environments. We highlight the multi-modal nature of this dataset and discuss and demonstrate its utility in various downstream tasks, such as place recognition and 2D and 3D semantic segmentation tasks.

Towards Long-term Robotics in the Wild

TL;DR

Abstract

Paper Structure (9 sections, 4 figures, 1 table)

This paper contains 9 sections, 4 figures, 1 table.

Introduction
Background and Related Work
Dataset Description
Sensor Platform
Ground Truth
Pre-processing
Dataset Use Cases
Discussion
Conclusion and Future Work

Figures (4)

Figure 1: Our sensor platform is a modular device which contains four cameras, a spinning lidar sensor, encoder, IMU and GPS, and can be hand-held or mounted on a mobile robot.
Figure 2: Example places from our dataset showing the different modalities. The first row shows the RGB modality, the second row the 2D annotated image, row three shows the 3D point cloud which has also been annotated with semantics. The bottom row displays the projection from 3D onto the 2D images, colour coded by depth.
Figure 3: Examples of semantic submaps (top) compared to their corresponding submaps used for lidar PR (bottom). Semantic point clouds include only points falling in the frustum of the front camera.
Figure 4: Terrain types present in our dataset.

Towards Long-term Robotics in the Wild

TL;DR

Abstract

Towards Long-term Robotics in the Wild

Authors

TL;DR

Abstract

Table of Contents

Figures (4)