MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing
Chuang Yang, Bingxuan Zhao, Qing Zhou, Qi Wang
TL;DR
MMO-IG tackles data scarcity in remote sensing image object detection by introducing three novel components: ISIM for fine-grained, multi-class, multi-scale object control; SCDKG for modeling complex inter-object spatial dependencies; and SODI for aligning global image content with instance-level supervision. The method uses a diffusion-based generator to decode an ISIM-conditioned prediction under SODI guidance, with ISIM regions mapped to grayscale class codes via a defined $v_{gray}$ function and inter-object relationships enforced by the $p_{id}$ matrix. Experimental results on DIOR and DIOR-R demonstrate improved image realism and strong transfer to downstream detectors, outperforming existing layout-to-image and RS generative models in FID and CAS, and providing useful data augmentation gains for several detectors. The work advances data generation for RSIOD, enabling robust training with dense MMO-labeled RS images while acknowledging limitations related to rare instances, large target counts, and cross-domain generalization.
Abstract
The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.
