Table of Contents
Fetching ...

Caltech Aerial RGB-Thermal Dataset in the Wild

Connor Lee, Matthew Anderson, Nikhil Raganathan, Xingxing Zuo, Kevin Do, Georgia Gkioxari, Soon-Jo Chung

TL;DR

The paper introduces the Caltech Aerial RGB-Thermal Dataset, a public collection designed to advance thermal perception and motion tracking for field robotics in natural environments. It provides synchronized RGB, thermal, GPS, and IMU data with 4195 annotated thermal frames across rivers, lakes, coasts, deserts, and forests, and establishes benchmarks for thermal and RGB-T semantic segmentation, RGB-T image translation, and motion tracking under temporal and geographical domain shifts. Through extensive experiments on multiple baselines, the study reveals that current methods struggle with cross-domain generalization and thermal-specific challenges in open natural scenes, while RGB-T fusion offers improvements at a high computational cost, and zero-shot foundation models show limited transfer to thermal imagery. The dataset and accompanying code aim to catalyze progress in robust, nighttime, and weather-robust perception and localization for aerial field robots, enabling broader deployment in remote environments.

Abstract

We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, thermal, global positioning, and inertial data. We provide semantic segmentation annotations for 10 classes commonly encountered in natural settings in order to drive the development of perception algorithms robust to adverse weather and nighttime conditions. Using this dataset, we propose new and challenging benchmarks for thermal and RGB-thermal (RGB-T) semantic segmentation, RGB-T image translation, and motion tracking. We present extensive results using state-of-the-art methods and highlight the challenges posed by temporal and geographical domain shifts in our data. The dataset and accompanying code is available at https://github.com/aerorobotics/caltech-aerial-rgbt-dataset.

Caltech Aerial RGB-Thermal Dataset in the Wild

TL;DR

The paper introduces the Caltech Aerial RGB-Thermal Dataset, a public collection designed to advance thermal perception and motion tracking for field robotics in natural environments. It provides synchronized RGB, thermal, GPS, and IMU data with 4195 annotated thermal frames across rivers, lakes, coasts, deserts, and forests, and establishes benchmarks for thermal and RGB-T semantic segmentation, RGB-T image translation, and motion tracking under temporal and geographical domain shifts. Through extensive experiments on multiple baselines, the study reveals that current methods struggle with cross-domain generalization and thermal-specific challenges in open natural scenes, while RGB-T fusion offers improvements at a high computational cost, and zero-shot foundation models show limited transfer to thermal imagery. The dataset and accompanying code aim to catalyze progress in robust, nighttime, and weather-robust perception and localization for aerial field robots, enabling broader deployment in remote environments.

Abstract

We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, thermal, global positioning, and inertial data. We provide semantic segmentation annotations for 10 classes commonly encountered in natural settings in order to drive the development of perception algorithms robust to adverse weather and nighttime conditions. Using this dataset, we propose new and challenging benchmarks for thermal and RGB-thermal (RGB-T) semantic segmentation, RGB-T image translation, and motion tracking. We present extensive results using state-of-the-art methods and highlight the challenges posed by temporal and geographical domain shifts in our data. The dataset and accompanying code is available at https://github.com/aerorobotics/caltech-aerial-rgbt-dataset.
Paper Structure (42 sections, 7 figures, 8 tables)

This paper contains 42 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Left: Our dataset is uniquely designed to improve thermal scene perception for field robots. Right: We provide new benchmarks for thermal-based vision algorithms, including (a) semantic segmentation, (b) image translation, and (c) motion tracking.
  • Figure 2: (a) The Aurelia X6 hexacopter and sensor stack used to capture our aerial dataset. (b) Geographic distribution of our data collection sites and collection times.
  • Figure 3: Semantic segmentation classes in our dataset. The color mapping is used throughout this paper. (a) Hourly distribution of annotated thermal images. (b) Histogram of semantic classes.
  • Figure 4: Thermal images and semantic segmentation labels from each capture area with inference results from EfficientViT, FastSCNN, and ConvNext-B (CLIP).
  • Figure 5: (a) Thermal semantic segmentation failures due to intra-class semantic variations when testing on geographically-partitioned, out-of-domain data. (b) Failures due to photometric variations between day (in-domain) and night (out-of-domain).
  • ...and 2 more figures