Table of Contents
Fetching ...

Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain

Lianyu Wang, Meng Wang, Daoqiang Zhang, Huazhu Fu

TL;DR

This paper tackles unsupervised domain adaptation under large domain gaps by introducing Style-aware Self-Intermediate Domain (SSID), a strategy that generates labeled, style-rich intermediate representations to bridge source and target domains. It combines a style-aware feature fusion (SAFF) module, an external memory bank, and intra-/inter-domain losses to enhance both transferability and discriminability, underpinned by a theoretical convergence argument for the SSID sampling scheme. The loss formulation $ abla \,\mathcal{L} = \mathcal{L}_{intra} + \alpha \mathcal{L}_{inter}$ integrates cross-entropy on labeled data, mutual-information signals, and KL-based alignment between memory-derived centers and current features, with a rising-style schedule to stabilize training. Experiments on VisDA-2017 and Office-Home demonstrate state-of-the-art performance across multiple backbones and validate the plug-and-play nature of SSID for diverse UDA backbones.

Abstract

Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain. Reducing inter-domain differences has always been a crucial factor to improve performance in UDA, especially for tasks where there is a large gap between source and target domains. To this end, we propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discriminative information. Inspired by the human transitive inference and learning ability, a novel style-aware self-intermediate domain (SSID) is investigated to link two seemingly unrelated concepts through a series of intermediate auxiliary synthesized concepts. Specifically, we propose a novel learning strategy of SSID, which selects samples from both source and target domains as anchors, and then randomly fuses the object and style features of these anchors to generate labeled and style-rich intermediate auxiliary features for knowledge transfer. Moreover, we design an external memory bank to store and update specified labeled features to obtain stable class features and class-wise style features. Based on the proposed memory bank, the intra- and inter-domain loss functions are designed to improve the class recognition ability and feature compatibility, respectively. Meanwhile, we simulate the rich latent feature space of SSID by infinite sampling and the convergence of the loss function by mathematical theory. Finally, we conduct comprehensive experiments on commonly used domain adaptive benchmarks to evaluate the proposed SAFF, and the experimental results show that the proposed SAFF can be easily combined with different backbone networks and obtain better performance as a plug-in-plug-out module.

Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain

TL;DR

This paper tackles unsupervised domain adaptation under large domain gaps by introducing Style-aware Self-Intermediate Domain (SSID), a strategy that generates labeled, style-rich intermediate representations to bridge source and target domains. It combines a style-aware feature fusion (SAFF) module, an external memory bank, and intra-/inter-domain losses to enhance both transferability and discriminability, underpinned by a theoretical convergence argument for the SSID sampling scheme. The loss formulation integrates cross-entropy on labeled data, mutual-information signals, and KL-based alignment between memory-derived centers and current features, with a rising-style schedule to stabilize training. Experiments on VisDA-2017 and Office-Home demonstrate state-of-the-art performance across multiple backbones and validate the plug-and-play nature of SSID for diverse UDA backbones.

Abstract

Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain. Reducing inter-domain differences has always been a crucial factor to improve performance in UDA, especially for tasks where there is a large gap between source and target domains. To this end, we propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discriminative information. Inspired by the human transitive inference and learning ability, a novel style-aware self-intermediate domain (SSID) is investigated to link two seemingly unrelated concepts through a series of intermediate auxiliary synthesized concepts. Specifically, we propose a novel learning strategy of SSID, which selects samples from both source and target domains as anchors, and then randomly fuses the object and style features of these anchors to generate labeled and style-rich intermediate auxiliary features for knowledge transfer. Moreover, we design an external memory bank to store and update specified labeled features to obtain stable class features and class-wise style features. Based on the proposed memory bank, the intra- and inter-domain loss functions are designed to improve the class recognition ability and feature compatibility, respectively. Meanwhile, we simulate the rich latent feature space of SSID by infinite sampling and the convergence of the loss function by mathematical theory. Finally, we conduct comprehensive experiments on commonly used domain adaptive benchmarks to evaluate the proposed SAFF, and the experimental results show that the proposed SAFF can be easily combined with different backbone networks and obtain better performance as a plug-in-plug-out module.
Paper Structure (25 sections, 15 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 15 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: The illustration of the proposed SSID. In the UDA task, the significant gap between the source and target domains (represented by the blue dashed line) presents a challenge for transferring class-discriminative information. The proposed SSID learning strategy addresses this by randomly selecting samples from multiple domains as anchors and computing their object and style features. It then generates a labeled intermediate auxiliary class-discriminative information by randomly combining the object and style features (indicated by the red dashed line). The labeled and style-rich SSID serves as a bridge, implicitly transferring class information from the source to the target domains.
  • Figure 2: The illustration of our proposed framework, including the SSID learning strategy, the external memory bank, and intra/inter-domain losses. Samples from the source domain, SSID, and target domain are fed into the feature extractor in parallel, denoted by blue, green, and orange, respectively. The SSID learning strategy is deployed after each feature extractor block, followed by the external memory bank.
  • Figure 3: The illustration of style fusion from content images to style images. Style features of the style images are used to perform style re-assign on the object features of the content images to obtain the fused features. The content and style of the fused images are consistent with the content images and the style images, respectively. Features can be visualized by a trained generator.
  • Figure 4: Left: The transferability comparison: average inter-domain distance ${D_{s \leftrightarrow t}}$ of different backbones and backbone + SSID on VisDA-2017. Right: The discriminability comparison: accuracy of all samples in the source and target domains of different backbones and backbone + SSID on VisDA-2017 dataset.
  • Figure 5: Grad-CAM visualizations on the VisDA-2017. The "Image" rows display random original images, while the "SWIN" and "SWIN+SSID" rows show the Grad-CAM visualization results obtained by SWIN and SWIN+SSID, respectively. The columns correspond to different categories in the dataset, such as Airplanes, Bicycles, Bus, and so on. The network's attention is indicated by the intensity of red, with higher attention in the red regions and lower attention in the blue-violet regions.