Table of Contents
Fetching ...

DSDRNet: Disentangling Representation and Reconstruct Network for Domain Generalization

Juncheng Yang, Zuchao Li, Shuai Xie, Wei Yu, Shijun Li

TL;DR

DSDRNet tackles domain generalization by learning disentangled representations and robust reconstructions across unseen domains. The method introduces a dual-stream framework with AdaIN-based disentanglement and a two-stage cyclic reconstruction, enforcing both intra-instance and inter-instance semantic constraints to preserve semantic structure while varying style attributes. The approach combines reconstruction, cross-cycle consistency, semantics adversarial supervision, and KL/divergence-based classification signals, achieving state-of-the-art or competitive results on Digits-DG, PACS, OfficeHome, and DomainNet. This yields practical improvements in generalization under domain shifts without relying on domain labels, though it requires careful loss-balancing and presents opportunities for further interpretability via causal analysis.

Abstract

Domain generalization faces challenges due to the distribution shift between training and testing sets, and the presence of unseen target domains. Common solutions include domain alignment, meta-learning, data augmentation, or ensemble learning, all of which rely on domain labels or domain adversarial techniques. In this paper, we propose a Dual-Stream Separation and Reconstruction Network, dubbed DSDRNet. It is a disentanglement-reconstruction approach that integrates features of both inter-instance and intra-instance through dual-stream fusion. The method introduces novel supervised signals by combining inter-instance semantic distance and intra-instance similarity. Incorporating Adaptive Instance Normalization (AdaIN) into a two-stage cyclic reconstruction process enhances self-disentangled reconstruction signals to facilitate model convergence. Extensive experiments on four benchmark datasets demonstrate that DSDRNet outperforms other popular methods in terms of domain generalization capabilities.

DSDRNet: Disentangling Representation and Reconstruct Network for Domain Generalization

TL;DR

DSDRNet tackles domain generalization by learning disentangled representations and robust reconstructions across unseen domains. The method introduces a dual-stream framework with AdaIN-based disentanglement and a two-stage cyclic reconstruction, enforcing both intra-instance and inter-instance semantic constraints to preserve semantic structure while varying style attributes. The approach combines reconstruction, cross-cycle consistency, semantics adversarial supervision, and KL/divergence-based classification signals, achieving state-of-the-art or competitive results on Digits-DG, PACS, OfficeHome, and DomainNet. This yields practical improvements in generalization under domain shifts without relying on domain labels, though it requires careful loss-balancing and presents opportunities for further interpretability via causal analysis.

Abstract

Domain generalization faces challenges due to the distribution shift between training and testing sets, and the presence of unseen target domains. Common solutions include domain alignment, meta-learning, data augmentation, or ensemble learning, all of which rely on domain labels or domain adversarial techniques. In this paper, we propose a Dual-Stream Separation and Reconstruction Network, dubbed DSDRNet. It is a disentanglement-reconstruction approach that integrates features of both inter-instance and intra-instance through dual-stream fusion. The method introduces novel supervised signals by combining inter-instance semantic distance and intra-instance similarity. Incorporating Adaptive Instance Normalization (AdaIN) into a two-stage cyclic reconstruction process enhances self-disentangled reconstruction signals to facilitate model convergence. Extensive experiments on four benchmark datasets demonstrate that DSDRNet outperforms other popular methods in terms of domain generalization capabilities.
Paper Structure (21 sections, 15 equations, 4 figures, 5 tables)

This paper contains 21 sections, 15 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: DSDRNet and Competing Methods: Domain Adaptation ganin2016domainhoffman2018cycada, Representation Disentanglement lee2021dranet. DSDRNet learns a generalized feature representation through disentanglement, encoding, and reconstruction, consisting of the cyclic reconstruction loop with E, S, and G.
  • Figure 2: Illustration of the proposed DSDRNet. Specifically, in the first stage, the disentanglement-reconstruction phase, we randomly select samples $A$ and $B$ from the source domain. These samples are processed through the separated $S$ and encoding $E$ modules, extracting attributes $A^v$, semantics $A^s$, and feature information $A^f$ from sample $A$, and attributes $B^v$, semantics $B^s$, and feature information $B^f$ from sample $B$. The generator $G$ utilizes ($A^s$, $B^v$) and ($B^s$, $A^v$) to generate new images $U$ and $Q$ , and utilizes ($A^s$, $A^v$) and ($B^s$, $B^v$) to reconstruct images $\bar{A}$ and $\bar{B}$. In the second stage, $U$ and $Q$ are further processed through $S$, $E$, and $G$ to generate new images $A'$ and $B'$. The model improves generalization performance through reconstruction $\mathcal{L}_{\text{recon}}$, intra-instance reconstruction $\mathcal{L}_{\text{intra}}$, inter-instance reconstruction $\mathcal{L}_{\text{inter}}$, cross-cycle consistency $\mathcal{L}_{\text{cycle}}$, classification $\mathcal{L}_{\text{ce}}$, $\mathcal{L}_{\text{kl}}$, and semantic adversarial $\mathcal{L}_{\text{adv}}$ terms.
  • Figure 3: DSDRNet's performance of $\mathcal{L}_{\text{adv}}$, $\mathcal{L}_{\text{recon}}$, $\mathcal{L}_{\text{cycle}}$, $\mathcal{L}_{\text{intra}}$ and $\mathcal{L}_{\text{ce}}$ on the MNISTM dataset.
  • Figure 4: t-SNE visualization of the OfficeHome. Different colors represent the different categories in the domain. (a) Feature distribution of raw data from OfficeHome. (b) Feature distribution of the sample after the processing of the DSDRNet.