AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Datao Tang, Xiangyong Cao, Xuan Wu, Jialin Li, Jing Yao, Xueru Bai, Dongsheng Jiang, Yin Li, Deyu Meng
TL;DR
RSIOD suffers from limited labeled data, which hampers detector performance. The authors introduce AeroGen, a layout-controllable diffusion framework that supports both horizontal and rotated bounding box conditioning to generate high-quality remote sensing images, paired with an end-to-end data augmentation pipeline that ensures diversity and semantic- layout coherence through a diversity-conditioned generator and filtering. Key contributions include a layout-conditional diffusion model with Fourier-encoded layout inputs and layout mask attention, plus a five-stage generative pipeline (label generation, filtering, image generation, filtering, and augmentation) to produce synthetic data used alongside real data to train detectors. Empirical results on DIOR, DIOR-R, and HRSC show consistent improvements in mAP (e.g., +3.7%, +4.3%, +2.43%) and notable gains in rare object classes, validating the approach’s practical impact for expanding RSIOD datasets with controllable diffusion. Overall, AeroGen demonstrates the effectiveness of conditional diffusion with precise layout control for remote sensing data augmentation and downstream object detection gains.
Abstract
Remote sensing image object detection (RSIOD) aims to identify and locate specific objects within satellite or aerial imagery. However, there is a scarcity of labeled data in current RSIOD datasets, which significantly limits the performance of current detection algorithms. Although existing techniques, e.g., data augmentation and semi-supervised learning, can mitigate this scarcity issue to some extent, they are heavily dependent on high-quality labeled data and perform worse in rare object classes. To address this issue, this paper proposes a layout-controllable diffusion generative model (i.e. AeroGen) tailored for RSIOD. To our knowledge, AeroGen is the first model to simultaneously support horizontal and rotated bounding box condition generation, thus enabling the generation of high-quality synthetic images that meet specific layout and object category requirements. Additionally, we propose an end-to-end data augmentation framework that integrates a diversity-conditioned generator and a filtering mechanism to enhance both the diversity and quality of generated data. Experimental results demonstrate that the synthetic data produced by our method are of high quality and diversity. Furthermore, the synthetic RSIOD data can significantly improve the detection performance of existing RSIOD models, i.e., the mAP metrics on DIOR, DIOR-R, and HRSC datasets are improved by 3.7%, 4.3%, and 2.43%, respectively. The code is available at https://github.com/Sonettoo/AeroGen.
