Table of Contents
Fetching ...

Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach

Zhenbang Du, Jiayu An, Yunlu Tu, Jiahao Hong, Dongrui Wu

TL;DR

This article proposes dual-space detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold, and introduces a graph router to better make use of the spatial information among the image patches.

Abstract

Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily misclassify unknown samples as known classes. Mixture-of-Experts (MoE) could be a remedy. Within a MoE, different experts handle distinct input features, producing unique expert routing patterns for various classes in a routing feature space. As a result, unknown class samples may display different expert routing patterns to known classes. In this paper, we propose Dual-Space Detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold. Graph Router is further introduced to better make use of the spatial information among image patches. Experiments on three different datasets validated the effectiveness and superiority of our approach.

Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach

TL;DR

This article proposes dual-space detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold, and introduces a graph router to better make use of the spatial information among the image patches.

Abstract

Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily misclassify unknown samples as known classes. Mixture-of-Experts (MoE) could be a remedy. Within a MoE, different experts handle distinct input features, producing unique expert routing patterns for various classes in a routing feature space. As a result, unknown class samples may display different expert routing patterns to known classes. In this paper, we propose Dual-Space Detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold. Graph Router is further introduced to better make use of the spatial information among image patches. Experiments on three different datasets validated the effectiveness and superiority of our approach.
Paper Structure (24 sections, 18 equations, 7 figures, 5 tables)

This paper contains 24 sections, 18 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Comparison between existing OSDA approaches (top) and our proposed DSD (bottom). The transparency level of images corresponds to the degree of activation. DSD utilizes both the image feature space and the routing feature space in MoE to identify unknown samples, leading to improved performance.
  • Figure 2: Graph Router MoE (left) and DSD (right). We store the model final output image features in the image feature memory, and the routing features (note they are different from the routing scores) in the routing feature memory. We then assign pseudo-labels to target domain samples in both spaces, and those with inconsistent pseudo-labels are clustered to obtain unknown class centers. Finally, we conduct contrastive learning on all samples and update both memory banks.
  • Figure 3: Overview of the Graph Router. (Left) The conversion from the embeddings to the graph. Each patch embedding serves as a node. The edges are formed by connecting adjacent patches of the original image and linking every patch to the class token. (Right) The graph is input into the Graph Router. The routing features are extracted from the GAT layer, and the routing scores are obtained from the FC layer. 'Norm' denotes the normalization operation.
  • Figure 4: Hyper-parameters analysis on Office31 and Art$\to$Product, Clipart$\to$Real-World, Product$\to$Clipart and Real-World$\to$Art on OfficeHome. (a) $N$, the total number of experts; (b) $K$, the number of experts selected during each routing step; (c) $m$ in Eq. (\ref{['eq:m_ins']}); and, (d) $\gamma$.
  • Figure 5: (a)-(c) Performance of DCC, GLC and DSD (Ours) on Office31 and Clipart$\to$Art, Product$\to$Real-World on OfficeHome with different numbers of known classes. (d) The learning curves on VisDA, where 'Incon Acc' denotes the percentage of samples with inconsistent pseudo-labels being from unknown classes.
  • ...and 2 more figures