Table of Contents
Fetching ...

MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images

Shuaihao Han, Tingfa Xu, Peifu Liu, Jianan Li

TL;DR

This work tackles the scarcity of large-scale multispectral data for aerial object detection by introducing MODA, a diverse and high-resolution MSI dataset with 8 spectral bands and oriented annotations. It also proposes OSSDet, a single-stream framework that tightly integrates spectral and spatial cues with object-aware guidance through modules like CSSP, SACF, and object activation masking. The approach yields state-of-the-art results on MODA and HOD3K, with ablations confirming the value of spectral-spatial integration and object-aware cues for robust detection of small and cluttered objects in aerial scenes. Overall, MODA and OSSDet provide a strong foundation for practical multispectral aerial detection and future research.

Abstract

Aerial object detection faces significant challenges in real-world scenarios, such as small objects and extensive background interference, which limit the performance of RGB-based detectors with insufficient discriminative information. Multispectral images (MSIs) capture additional spectral cues across multiple bands, offering a promising alternative. However, the lack of training data has been the primary bottleneck to exploiting the potential of MSIs. To address this gap, we introduce the first large-scale dataset for Multispectral Object Detection in Aerial images (MODA), which comprises 14,041 MSIs and 330,191 annotations across diverse, challenging scenarios, providing a comprehensive data foundation for this field. Furthermore, to overcome challenges inherent to aerial object detection using MSIs, we propose OSSDet, a framework that integrates spectral and spatial information with object-aware cues. OSSDet employs a cascaded spectral-spatial modulation structure to optimize target perception, aggregates spectrally related features by exploiting spectral similarities to reinforce intra-object correlations, and suppresses irrelevant background via object-aware masking. Moreover, cross-spectral attention further refines object-related representations under explicit object-aware guidance. Extensive experiments demonstrate that OSSDet outperforms existing methods with comparable parameters and efficiency.

MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images

TL;DR

This work tackles the scarcity of large-scale multispectral data for aerial object detection by introducing MODA, a diverse and high-resolution MSI dataset with 8 spectral bands and oriented annotations. It also proposes OSSDet, a single-stream framework that tightly integrates spectral and spatial cues with object-aware guidance through modules like CSSP, SACF, and object activation masking. The approach yields state-of-the-art results on MODA and HOD3K, with ablations confirming the value of spectral-spatial integration and object-aware cues for robust detection of small and cluttered objects in aerial scenes. Overall, MODA and OSSDet provide a strong foundation for practical multispectral aerial detection and future research.

Abstract

Aerial object detection faces significant challenges in real-world scenarios, such as small objects and extensive background interference, which limit the performance of RGB-based detectors with insufficient discriminative information. Multispectral images (MSIs) capture additional spectral cues across multiple bands, offering a promising alternative. However, the lack of training data has been the primary bottleneck to exploiting the potential of MSIs. To address this gap, we introduce the first large-scale dataset for Multispectral Object Detection in Aerial images (MODA), which comprises 14,041 MSIs and 330,191 annotations across diverse, challenging scenarios, providing a comprehensive data foundation for this field. Furthermore, to overcome challenges inherent to aerial object detection using MSIs, we propose OSSDet, a framework that integrates spectral and spatial information with object-aware cues. OSSDet employs a cascaded spectral-spatial modulation structure to optimize target perception, aggregates spectrally related features by exploiting spectral similarities to reinforce intra-object correlations, and suppresses irrelevant background via object-aware masking. Moreover, cross-spectral attention further refines object-related representations under explicit object-aware guidance. Extensive experiments demonstrate that OSSDet outperforms existing methods with comparable parameters and efficiency.

Paper Structure

This paper contains 16 sections, 13 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: In challenging scenarios, limited spatial information in RGB data hampers effective detection. In contrast, multispectral images offer additional spectral cues that significantly enhance target discrimination for robust detection.
  • Figure 2: Comparison with other multispectral object detection datasets and examples of challenge attributes in MODA.
  • Figure 3: Statistical analysis of MODA. Distribution of instance counts across 8 categories (a) and per MSI (b). (c) Relative and absolute distributions of instance sizes.
  • Figure 4: (a) Overall OSSDet framework. SACF fuses aggregated spectral features with spatially enhanced details to reinforce intra-object correlations and spatial texture details; (b) CSSP integrates spectral and spatial awareness to improve target perception; (c) CAFR refines object-related representations with explicit object-aware cues and cross-spectral attention.
  • Figure 5: Illustration of SACF. The Euc-Dis denotes Euclidean Distance for spectral vector similarity measurement.
  • ...and 4 more figures