Table of Contents
Fetching ...

Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots

Connor Lee, Saraswati Soedarmadji, Matthew Anderson, Anthony J. Clark, Soon-Jo Chung

TL;DR

This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots due to the lack of annotated thermal field datasets and the time and costs of manual annotation, enabling precise and rapid annotation of thermal data from field collection efforts at a massively-parallelizable scale.

Abstract

We present a new method to automatically generate semantic segmentation annotations for thermal imagery captured from an aerial vehicle by utilizing satellite-derived data products alongside onboard global positioning and attitude estimates. This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots due to the lack of annotated thermal field datasets and the time and costs of manual annotation, enabling precise and rapid annotation of thermal data from field collection efforts at a massively-parallelizable scale. By incorporating a thermal-conditioned refinement step with visual foundation models, our approach can produce highly-precise semantic segmentation labels using low-resolution satellite land cover data for little-to-no cost. It achieves 98.5% of the performance from using costly high-resolution options and demonstrates between 70-160% improvement over popular zero-shot semantic segmentation methods based on large vision-language models currently used for generating annotations for RGB imagery. Code will be available at: https://github.com/connorlee77/aerial-auto-segment.

Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots

TL;DR

This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots due to the lack of annotated thermal field datasets and the time and costs of manual annotation, enabling precise and rapid annotation of thermal data from field collection efforts at a massively-parallelizable scale.

Abstract

We present a new method to automatically generate semantic segmentation annotations for thermal imagery captured from an aerial vehicle by utilizing satellite-derived data products alongside onboard global positioning and attitude estimates. This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots due to the lack of annotated thermal field datasets and the time and costs of manual annotation, enabling precise and rapid annotation of thermal data from field collection efforts at a massively-parallelizable scale. By incorporating a thermal-conditioned refinement step with visual foundation models, our approach can produce highly-precise semantic segmentation labels using low-resolution satellite land cover data for little-to-no cost. It achieves 98.5% of the performance from using costly high-resolution options and demonstrates between 70-160% improvement over popular zero-shot semantic segmentation methods based on large vision-language models currently used for generating annotations for RGB imagery. Code will be available at: https://github.com/connorlee77/aerial-auto-segment.
Paper Structure (24 sections, 3 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Proposed pipeline for automatically generating semantic segmentation annotations from satellite-derived data. Coarse segmentation labels for thermal images are rendered from Land Use and Land Cover (LULC) datasets and Digital Elevation Maps (DEM). The labels are refined using Segment Anything kirillov2023segany to capture fine details between segmentation instances.
  • Figure 2: Dense CRF refinement of Dynamic World land cover raster using NAIP and PlanetScope imagery of Castaic Lake, CA. Results via PlanetScope convey the actual scenery at time of thermal image capture due to its high revisit frequency but at a lower 3m spatial resolution. NAIP refinement offers 1m resolution but is susceptible to changes in the terrain (notably, water levels of lakes) due to its triennial capture cycle. Zoom in to see key differences (outlined in dashed boxes).
  • Figure 3: Class mappings between LULC datasets and our ground truth evaluation set. The UAV thermal dataset is from lee2024cart.
  • Figure 4: Generated segmentations from the baseline (ODISE xu2023open), our methods, and the ground truth (GT) using class mappings and colors from Fig. \ref{['fig:class-mappings']}. Mismatches between CM-6 labels and GT can occur depending on the LULC source used but are resolved with CM-3. Segmentations for classes containing small, sparse, and thin instances (CM-6), e.g. low vegetation and built, are hard to render due to low LULC resolution and low thermal contrast during label refinement.
  • Figure 5: Rendered label refinement process with SAM kirillov2023segany.
  • ...and 2 more figures