Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization
Hao Li, Yubin Xiao, Ke Liang, Mengzhu Wang, Long Lan, Kenli Li, Xinwang Liu
TL;DR
This work tackles single-domain generalization (SDG) under synthetic-data biases, revealing that diffusion-generated data can harm performance due to feature-space discrepancies. It introduces Discriminative Domain Reassembly and Soft-Fusion (DRSF), a plug-and-play framework comprising DFDR for feature decoupling and reassembly and MDSF for multi-pseudo-domain soft fusion, guided by entropy-based supervision and adversarial training to build a continuous, domain-invariant feature space from a single source. Generating diverse pseudo-target domains with latent diffusion models, DRSF demonstrates state-of-the-art SDG performance in object detection and semantic segmentation with modest computational overhead and seamless compatibility with unsupervised domain adaptation (UDA) methods. The approach provides a practical pathway to leverage synthetic data for robust cross-domain generalization in real-world vision tasks, with broad applicability and potential for extension to multi-modal domains and foundation-model–assisted pipelines.
Abstract
Single Domain Generalization (SDG) aims to train models with consistent performance across diverse scenarios using data from a single source. While using latent diffusion models (LDMs) show promise in augmenting limited source data, we demonstrate that directly using synthetic data can be detrimental due to significant feature distribution discrepancies between synthetic and real target domains, leading to performance degradation. To address this issue, we propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization. We employ LDMs to produce diverse pseudo-target domain samples and introduce two key modules to handle distribution bias. First, Discriminative Feature Decoupling and Reassembly (DFDR) module uses entropy-guided attention to recalibrate channel-level features, suppressing synthetic noise while preserving semantic consistency. Second, Multi-pseudo-domain Soft Fusion (MDSF) module uses adversarial training with latent-space feature interpolation, creating continuous feature transitions between domains. Extensive SDG experiments on object detection and semantic segmentation tasks demonstrate that DRSF achieves substantial performance gains with only marginal computational overhead. Notably, DRSF's plug-and-play architecture enables seamless integration with unsupervised domain adaptation paradigms, underscoring its broad applicability in addressing diverse and real-world domain challenges.
