Table of Contents
Fetching ...

RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance

Xianbao Hou, Yonghao He, Zeyd Boukhers, John See, Hu Su, Wei Sui, Cong Yang

Abstract

Diffusion models have significantly mitigated the impact of annotated data scarcity in remote sensing (RS). Although recent approaches have successfully harnessed these models to enable diverse and controllable Layout-to-Image (L2I) synthesis, they still suffer from limited fine-grained control and fail to strictly adhere to bounding box constraints. To address these limitations, we propose RSGen, a plug-and-play framework that leverages diverse edge guidance to enhance layout-driven RS image generation. Specifically, RSGen employs a progressive enhancement strategy: 1) it first enriches the diversity of edge maps composited from retrieved training instances via Image-to-Image generation; and 2) subsequently utilizes these diverse edge maps as conditioning for existing L2I models to enforce pixel-level control within bounding boxes, ensuring the generated instances strictly adhere to the layout. Extensive experiments across three baseline models demonstrate that RSGen significantly boosts the capabilities of existing L2I models. For instance, with CC-Diff on the DOTA dataset for oriented object detection, we achieve remarkable gains of +9.8/+12.0 in YOLOScore mAP50/mAP50-95 and +1.6 in mAP on the downstream detection task. Our code will be publicly available: https://github.com/D-Robotics-AI-Lab/RSGen

RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance

Abstract

Diffusion models have significantly mitigated the impact of annotated data scarcity in remote sensing (RS). Although recent approaches have successfully harnessed these models to enable diverse and controllable Layout-to-Image (L2I) synthesis, they still suffer from limited fine-grained control and fail to strictly adhere to bounding box constraints. To address these limitations, we propose RSGen, a plug-and-play framework that leverages diverse edge guidance to enhance layout-driven RS image generation. Specifically, RSGen employs a progressive enhancement strategy: 1) it first enriches the diversity of edge maps composited from retrieved training instances via Image-to-Image generation; and 2) subsequently utilizes these diverse edge maps as conditioning for existing L2I models to enforce pixel-level control within bounding boxes, ensuring the generated instances strictly adhere to the layout. Extensive experiments across three baseline models demonstrate that RSGen significantly boosts the capabilities of existing L2I models. For instance, with CC-Diff on the DOTA dataset for oriented object detection, we achieve remarkable gains of +9.8/+12.0 in YOLOScore mAP50/mAP50-95 and +1.6 in mAP on the downstream detection task. Our code will be publicly available: https://github.com/D-Robotics-AI-Lab/RSGen
Paper Structure (18 sections, 7 equations, 5 figures, 6 tables)

This paper contains 18 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Visualization of controllability between existing L2I methods (MIGC zhou2024migc, CC-Diff zhang2024cc, FICGen wang2025ficgen) and the same methods equipped with RSGen. While the original methods (a) struggle to adhere to the specified bounding boxes, integrating our module (b) significantly enhances instance alignment and control precision.
  • Figure 2: Overview of RSGen, which consists of the Edge2Edge module (a) and the L2I FGControl module (b), where "CLS" denotes the class label. The Edge2Edge module enhances the diversity of retrieved edge maps through an I2I process. Subsequently, these diverse edges and layout inputs guide the L2I FGControl module, which interacts with the base L2I model to achieve precise pixel-level control. Our framework significantly increases structural diversity while ensuring fine-grained spatial alignment.
  • Figure 3: Visualization of diverse edge maps generated by the Edge2Edge module. Our method employs distinct random seeds to generate varied structural details within the specified bounding boxes, significantly enhancing the diversity of the structural priors.
  • Figure 4: Comparison of ControlNet zhang2023adding, ControlNet-XS zavadski2024controlnet, and our FGControl. Standard global control methods suffer from feature entanglement, causing background chaos. Conversely, FGControl strictly confines high-frequency structural guidance within the layout bounding boxes, achieving fine-grained local control without interfering with the global semantic synthesis.
  • Figure 5: Qualitative comparison of generated instances with and without the Edge2Edge module. Incorporating the Edge2Edge module introduces rich structural variations, significantly enhancing both the structural and overall diversity of the generated instances within the specified bounding boxes.