Table of Contents
Fetching ...

MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Chuang Yang, Bingxuan Zhao, Qing Zhou, Qi Wang

TL;DR

MMO-IG tackles data scarcity in remote sensing image object detection by introducing three novel components: ISIM for fine-grained, multi-class, multi-scale object control; SCDKG for modeling complex inter-object spatial dependencies; and SODI for aligning global image content with instance-level supervision. The method uses a diffusion-based generator to decode an ISIM-conditioned prediction under SODI guidance, with ISIM regions mapped to grayscale class codes via a defined $v_{gray}$ function and inter-object relationships enforced by the $p_{id}$ matrix. Experimental results on DIOR and DIOR-R demonstrate improved image realism and strong transfer to downstream detectors, outperforming existing layout-to-image and RS generative models in FID and CAS, and providing useful data augmentation gains for several detectors. The work advances data generation for RSIOD, enabling robust training with dense MMO-labeled RS images while acknowledging limitations related to rare instances, large target counts, and cross-domain generalization.

Abstract

The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.

MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

TL;DR

MMO-IG tackles data scarcity in remote sensing image object detection by introducing three novel components: ISIM for fine-grained, multi-class, multi-scale object control; SCDKG for modeling complex inter-object spatial dependencies; and SODI for aligning global image content with instance-level supervision. The method uses a diffusion-based generator to decode an ISIM-conditioned prediction under SODI guidance, with ISIM regions mapped to grayscale class codes via a defined function and inter-object relationships enforced by the matrix. Experimental results on DIOR and DIOR-R demonstrate improved image realism and strong transfer to downstream detectors, outperforming existing layout-to-image and RS generative models in FID and CAS, and providing useful data augmentation gains for several detectors. The work advances data generation for RSIOD, enabling robust training with dense MMO-labeled RS images while acknowledging limitations related to rare instances, large target counts, and cross-domain generalization.

Abstract

The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.

Paper Structure

This paper contains 17 sections, 2 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of the generation of RS images containing MMOs by the proposed MMO-IG. Notably, each RS object is modeled by a unique $\mathcal{P}_{\rm sgc}$ according to the corresponding realistic geometric characteristics.
  • Figure 2: Overall pipeline of MMO-IG for generating RS images with dense instance-level bounding box labels. It first synthesizes rational spatial geometric characteristics of MMOs via SCDKG. They then are encoded via the designed ISIM while describing the RS image content through SODI. In the end, following the diffusion model to decode the ISIM to RS image contained MMOs under the guidance of SODI. Notably, each RS object is modeled by a unique $\mathcal{P}_{\rm sgc}$ according to the corresponding realistic geometric characteristics.
  • Figure 3: Illustration of the proposed SCDKG, which models complex interdependencies among objects of different classes via $\rm p_{id}$ matrix and their diverse spatial geometric characteristics via $\mathcal{P}_{\rm sgc}$. Notably, each RS object is modeled by a unique $\mathcal{P}_{\rm sgc}$ according to the corresponding realistic geometric characteristics.
  • Figure 4: Illustration of the proposed ISIM encodes instances with different classes according to different grayscale values while keeping the location and scale characteristics of the corresponding regions on generated instances.
  • Figure 5: Illustration of the proposed SODI generation process. It consists of the combination of a structured scene head description (" A remote sensing image with") and a statistics description of RS objects.
  • ...and 5 more figures