Table of Contents
Fetching ...

Modeling Hierarchical Structural Distance for Unsupervised Domain Adaptation

Yingxue Xu, Guihua Wen, Yang Hu, Pei Yang

TL;DR

DeepHOT introduces a hierarchical optimal transport framework for unsupervised domain adaptation by unifying domain-level OT with image-level OT, where the image-level ground distance is used to shape domain alignment. The method employs mini-batch unbalanced OT for the domain level and a Sliced Wasserstein Distance-based image-level OT to efficiently capture local patch correspondences, forming a nested OT objective \mathcal{H}^n(\\mu_s^n, \\mu_t^n) = \\mathcal{W}_d^n(\\mu_s^n, \\mu_t^n, \\mathcal{W}_{img}). Empirical results on Office-Home, Office-31, Digits, and VisDA show consistent improvements over state-of-the-art OT-based and non-OT methods, with ablations confirming the benefit of both domain- and image-level components. The approach offers robust performance and scalability due to SWD efficiency and mini-batch unbalanced OT, highlighting the value of fine-grained, sample-level structure in transfer learning and practical UDA deployments.

Abstract

Unsupervised domain adaptation (UDA) aims to estimate a transferable model for unlabeled target domains by exploiting labeled source data. Optimal Transport (OT) based methods have recently been proven to be a promising solution for UDA with a solid theoretical foundation and competitive performance. However, most of these methods solely focus on domain-level OT alignment by leveraging the geometry of domains for domain-invariant features based on the global embeddings of images. However, global representations of images may destroy image structure, leading to the loss of local details that offer category-discriminative information. This study proposes an end-to-end Deep Hierarchical Optimal Transport method (DeepHOT), which aims to learn both domain-invariant and category-discriminative representations by mining hierarchical structural relations among domains. The main idea is to incorporate a domain-level OT and image-level OT into a unified OT framework, hierarchical optimal transport, to model the underlying geometry in both domain space and image space. In DeepHOT framework, an image-level OT serves as the ground distance metric for the domain-level OT, leading to the hierarchical structural distance. Compared with the ground distance of the conventional domain-level OT, the image-level OT captures structural associations among local regions of images that are beneficial to classification. In this way, DeepHOT, a unified OT framework, not only aligns domains by domain-level OT, but also enhances the discriminative power through image-level OT. Moreover, to overcome the limitation of high computational complexity, we propose a robust and efficient implementation of DeepHOT by approximating origin OT with sliced Wasserstein distance in image-level OT and accomplishing the mini-batch unbalanced domain-level OT.

Modeling Hierarchical Structural Distance for Unsupervised Domain Adaptation

TL;DR

DeepHOT introduces a hierarchical optimal transport framework for unsupervised domain adaptation by unifying domain-level OT with image-level OT, where the image-level ground distance is used to shape domain alignment. The method employs mini-batch unbalanced OT for the domain level and a Sliced Wasserstein Distance-based image-level OT to efficiently capture local patch correspondences, forming a nested OT objective \mathcal{H}^n(\\mu_s^n, \\mu_t^n) = \\mathcal{W}_d^n(\\mu_s^n, \\mu_t^n, \\mathcal{W}_{img}). Empirical results on Office-Home, Office-31, Digits, and VisDA show consistent improvements over state-of-the-art OT-based and non-OT methods, with ablations confirming the benefit of both domain- and image-level components. The approach offers robust performance and scalability due to SWD efficiency and mini-batch unbalanced OT, highlighting the value of fine-grained, sample-level structure in transfer learning and practical UDA deployments.

Abstract

Unsupervised domain adaptation (UDA) aims to estimate a transferable model for unlabeled target domains by exploiting labeled source data. Optimal Transport (OT) based methods have recently been proven to be a promising solution for UDA with a solid theoretical foundation and competitive performance. However, most of these methods solely focus on domain-level OT alignment by leveraging the geometry of domains for domain-invariant features based on the global embeddings of images. However, global representations of images may destroy image structure, leading to the loss of local details that offer category-discriminative information. This study proposes an end-to-end Deep Hierarchical Optimal Transport method (DeepHOT), which aims to learn both domain-invariant and category-discriminative representations by mining hierarchical structural relations among domains. The main idea is to incorporate a domain-level OT and image-level OT into a unified OT framework, hierarchical optimal transport, to model the underlying geometry in both domain space and image space. In DeepHOT framework, an image-level OT serves as the ground distance metric for the domain-level OT, leading to the hierarchical structural distance. Compared with the ground distance of the conventional domain-level OT, the image-level OT captures structural associations among local regions of images that are beneficial to classification. In this way, DeepHOT, a unified OT framework, not only aligns domains by domain-level OT, but also enhances the discriminative power through image-level OT. Moreover, to overcome the limitation of high computational complexity, we propose a robust and efficient implementation of DeepHOT by approximating origin OT with sliced Wasserstein distance in image-level OT and accomplishing the mini-batch unbalanced domain-level OT.
Paper Structure (19 sections, 11 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 11 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview architecture of DeepHOT. It consists of domain-level OT and an image-level OT, where the output of latter is used as the term of cost matrix $\textbf{C}^n$ for the former. And then the distance $\mathcal{H}^n(\mu_s^n, \mu_t^n)$ between $\mu_s^n$ and $\mu_t^n$ domains can be computed via domain-level OT. Colors refer to different classes in domain-level OT and different patches in image-level OT.
  • Figure 2: The t-SNE visualization. Representations of task C$\rightarrow$P on Office-Home dataset (65 classes) and task U$\rightarrow$M on Digits dataset (10 classes) are visualized for various methods in (a)-(c), where each color denotes a class.
  • Figure 3: Comparison of training time.
  • Figure 4: Image-level Transport Plan (Consistent Category) on P-C, C-R and A-P adaptation scenarios.
  • Figure 5: The effect of batch size (from 65 to 130) for DeepHOT with/without UOT.
  • ...and 2 more figures