Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation
Buddhi Wijenayake, Nichula Wasalathilake, Roshan Godaliyadda, Vijitha Herath, Parakrama Ekanayake, Vishal M. Patel
TL;DR
The paper addresses long-tail pixel imbalance and domain shift between Urban and Rural LoveDA splits in high-resolution remote-sensing semantic segmentation. It introduces a two-stage prompt-controlled diffusion pipeline: Stage A uses a domain- and ratio-conditioned discrete layout diffusion (D3PM) to generate label maps with targeted class proportions, and Stage B employs a layout-guided latent diffusion with ControlNet to render photorealistic, domain-consistent images from those layouts. A greedy enrichment strategy yields roughly 2000 synthetic label–image pairs, which are mixed with real LoveDA data to train multiple segmentation backbones, with notable gains for minority classes and improved cross-domain generalization. The work demonstrates that controllable generative augmentation is a practical approach to mitigating long-tail bias in remote-sensing segmentation and can complement existing augmentation and loss-based methods, enabling more robust urban/rural land-cover mapping.
Abstract
Semantic segmentation of high-resolution remote-sensing imagery is critical for urban mapping and land-cover monitoring, yet training data typically exhibits severe long-tailed pixel imbalance. In the dataset LoveDA, this challenge is compounded by an explicit Urban/Rural split with distinct appearance and inconsistent class-frequency statistics across domains. We present a prompt-controlled diffusion augmentation framework that synthesizes paired label--image samples with explicit control of both domain and semantic composition. Stage~A uses a domain-aware, masked ratio-conditioned discrete diffusion model to generate layouts that satisfy user-specified class-ratio targets while respecting learned co-occurrence structure. Stage~B translates layouts into photorealistic, domain-consistent images using Stable Diffusion with ControlNet guidance. Mixing the resulting ratio and domain-controlled synthetic pairs with real data yields consistent improvements across multiple segmentation backbones, with gains concentrated on minority classes and improved Urban and Rural generalization, demonstrating controllable augmentation as a practical mechanism to mitigate long-tail bias in remote-sensing segmentation. Source codes, pretrained models, and synthetic datasets are available at \href{https://github.com/Buddhi19/SyntheticGen.git}{Github}
