Table of Contents
Fetching ...

Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization

Hao Li, Yubin Xiao, Ke Liang, Mengzhu Wang, Long Lan, Kenli Li, Xinwang Liu

TL;DR

This work tackles single-domain generalization (SDG) under synthetic-data biases, revealing that diffusion-generated data can harm performance due to feature-space discrepancies. It introduces Discriminative Domain Reassembly and Soft-Fusion (DRSF), a plug-and-play framework comprising DFDR for feature decoupling and reassembly and MDSF for multi-pseudo-domain soft fusion, guided by entropy-based supervision and adversarial training to build a continuous, domain-invariant feature space from a single source. Generating diverse pseudo-target domains with latent diffusion models, DRSF demonstrates state-of-the-art SDG performance in object detection and semantic segmentation with modest computational overhead and seamless compatibility with unsupervised domain adaptation (UDA) methods. The approach provides a practical pathway to leverage synthetic data for robust cross-domain generalization in real-world vision tasks, with broad applicability and potential for extension to multi-modal domains and foundation-model–assisted pipelines.

Abstract

Single Domain Generalization (SDG) aims to train models with consistent performance across diverse scenarios using data from a single source. While using latent diffusion models (LDMs) show promise in augmenting limited source data, we demonstrate that directly using synthetic data can be detrimental due to significant feature distribution discrepancies between synthetic and real target domains, leading to performance degradation. To address this issue, we propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization. We employ LDMs to produce diverse pseudo-target domain samples and introduce two key modules to handle distribution bias. First, Discriminative Feature Decoupling and Reassembly (DFDR) module uses entropy-guided attention to recalibrate channel-level features, suppressing synthetic noise while preserving semantic consistency. Second, Multi-pseudo-domain Soft Fusion (MDSF) module uses adversarial training with latent-space feature interpolation, creating continuous feature transitions between domains. Extensive SDG experiments on object detection and semantic segmentation tasks demonstrate that DRSF achieves substantial performance gains with only marginal computational overhead. Notably, DRSF's plug-and-play architecture enables seamless integration with unsupervised domain adaptation paradigms, underscoring its broad applicability in addressing diverse and real-world domain challenges.

Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization

TL;DR

This work tackles single-domain generalization (SDG) under synthetic-data biases, revealing that diffusion-generated data can harm performance due to feature-space discrepancies. It introduces Discriminative Domain Reassembly and Soft-Fusion (DRSF), a plug-and-play framework comprising DFDR for feature decoupling and reassembly and MDSF for multi-pseudo-domain soft fusion, guided by entropy-based supervision and adversarial training to build a continuous, domain-invariant feature space from a single source. Generating diverse pseudo-target domains with latent diffusion models, DRSF demonstrates state-of-the-art SDG performance in object detection and semantic segmentation with modest computational overhead and seamless compatibility with unsupervised domain adaptation (UDA) methods. The approach provides a practical pathway to leverage synthetic data for robust cross-domain generalization in real-world vision tasks, with broad applicability and potential for extension to multi-modal domains and foundation-model–assisted pipelines.

Abstract

Single Domain Generalization (SDG) aims to train models with consistent performance across diverse scenarios using data from a single source. While using latent diffusion models (LDMs) show promise in augmenting limited source data, we demonstrate that directly using synthetic data can be detrimental due to significant feature distribution discrepancies between synthetic and real target domains, leading to performance degradation. To address this issue, we propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization. We employ LDMs to produce diverse pseudo-target domain samples and introduce two key modules to handle distribution bias. First, Discriminative Feature Decoupling and Reassembly (DFDR) module uses entropy-guided attention to recalibrate channel-level features, suppressing synthetic noise while preserving semantic consistency. Second, Multi-pseudo-domain Soft Fusion (MDSF) module uses adversarial training with latent-space feature interpolation, creating continuous feature transitions between domains. Extensive SDG experiments on object detection and semantic segmentation tasks demonstrate that DRSF achieves substantial performance gains with only marginal computational overhead. Notably, DRSF's plug-and-play architecture enables seamless integration with unsupervised domain adaptation paradigms, underscoring its broad applicability in addressing diverse and real-world domain challenges.

Paper Structure

This paper contains 41 sections, 22 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: A comparison of Domain Adaptation (DA), Conventional Domain Generalization (CDG), and our pseudo-target domain-based Single Domain Generalization (SDG) for object detection tasks. DA relies on aligning with target domain data (left), while CDG requires joint training across multiple source domains (middle). In contrast, our strategy only requires single-source data, using diffusion models to generate diverse pseudo-target domains (right).
  • Figure 2: 2D t-SNE visualization of image feature statistics for real and diffusion-generated domains.
  • Figure 3: Impact of the Synthetic Domain on Detector Cross-Domain Performance.
  • Figure 4: The overall DRSF framework. The backbone takes diverse pseudo-target domain data $\{\mathcal{D}^{pt}_i\}_{i=1}^K$, generated from a single source domain $\mathcal{D}^S$, as input. These features are then decoupled into primary features (domain-invariant features) and shared features (domain-specific features). Subsequently, a feature reassembly strategy is employed to mitigate interference arising from style variations. By integrating a multi-domain feature soft-fusion strategy, a continuous cross-domain feature space is constructed. The feature decoupling reassembly is embedded within the backbone network blocks, while the feature soft-fusion processes the output features from the backbone.
  • Figure 5: (a) Illustration of the DRSF framework, where the DFDR module is embedded within the blocks of the backbone network, and the MDSF operates on the backbone's output. (b) Example diagram of the proposed DFDR. This module employs Instance Normalization (IN) to decompose intermediate features, followed by channel recalibration attention for feature reassembly. This feature reassembly strategy reduces interference arising from style variations. (c) The proposed MDSF. This module aims to achieve smooth fusion between the source and pseudo-target domains at the feature level through linear interpolation, thereby constructing a continuous cross-domain feature space.
  • ...and 5 more figures