Table of Contents
Fetching ...

Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach

Inder Pal Singh, Enjie Ghorbel, Anis Kacem, Djamila Aouada

TL;DR

This work tackles unsupervised domain adaptation for multi-label image classification by removing the need for a separate domain discriminator. It reuses the task classifier as an implicit adversarial critic and models the distribution of classifier outputs with a two-component Gaussian Mixture Model, using the Fréchet distance between source and target components as the domain discrepancy. To enable end-to-end learning, it introduces DeepEM, a differentiable EM-inspired block that estimates GMM parameters in a single forward pass, eliminating iterative EM costs. Across diverse domain shifts, the approach achieves state-of-the-art mean Average Precision with fewer parameters and reduced training time, and the authors release the code for practitioners.

Abstract

This paper introduces a discriminator-free adversarial-based approach termed DDA-MLIC for Unsupervised Domain Adaptation (UDA) in the context of Multi-Label Image Classification (MLIC). While recent efforts have explored adversarial-based UDA methods for MLIC, they typically include an additional discriminator subnet. Nevertheless, decoupling the classification and the discrimination tasks may harm their task-specific discriminative power. Herein, we address this challenge by presenting a novel adversarial critic directly derived from the task-specific classifier. Specifically, we employ a two-component Gaussian Mixture Model (GMM) to model both source and target predictions, distinguishing between two distinct clusters. Instead of using the traditional Expectation Maximization (EM) algorithm, our approach utilizes a Deep Neural Network (DNN) to estimate the parameters of each GMM component. Subsequently, the source and target GMM parameters are leveraged to formulate an adversarial loss using the Fréchet distance. The proposed framework is therefore not only fully differentiable but is also cost-effective as it avoids the expensive iterative process usually induced by the standard EM method. The proposed method is evaluated on several multi-label image datasets covering three different types of domain shift. The obtained results demonstrate that DDA-MLIC outperforms existing state-of-the-art methods in terms of precision while requiring a lower number of parameters. The code is made publicly available at github.com/cvi2snt/DDA-MLIC.

Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach

TL;DR

This work tackles unsupervised domain adaptation for multi-label image classification by removing the need for a separate domain discriminator. It reuses the task classifier as an implicit adversarial critic and models the distribution of classifier outputs with a two-component Gaussian Mixture Model, using the Fréchet distance between source and target components as the domain discrepancy. To enable end-to-end learning, it introduces DeepEM, a differentiable EM-inspired block that estimates GMM parameters in a single forward pass, eliminating iterative EM costs. Across diverse domain shifts, the approach achieves state-of-the-art mean Average Precision with fewer parameters and reduced training time, and the authors release the code for practitioners.

Abstract

This paper introduces a discriminator-free adversarial-based approach termed DDA-MLIC for Unsupervised Domain Adaptation (UDA) in the context of Multi-Label Image Classification (MLIC). While recent efforts have explored adversarial-based UDA methods for MLIC, they typically include an additional discriminator subnet. Nevertheless, decoupling the classification and the discrimination tasks may harm their task-specific discriminative power. Herein, we address this challenge by presenting a novel adversarial critic directly derived from the task-specific classifier. Specifically, we employ a two-component Gaussian Mixture Model (GMM) to model both source and target predictions, distinguishing between two distinct clusters. Instead of using the traditional Expectation Maximization (EM) algorithm, our approach utilizes a Deep Neural Network (DNN) to estimate the parameters of each GMM component. Subsequently, the source and target GMM parameters are leveraged to formulate an adversarial loss using the Fréchet distance. The proposed framework is therefore not only fully differentiable but is also cost-effective as it avoids the expensive iterative process usually induced by the standard EM method. The proposed method is evaluated on several multi-label image datasets covering three different types of domain shift. The obtained results demonstrate that DDA-MLIC outperforms existing state-of-the-art methods in terms of precision while requiring a lower number of parameters. The code is made publicly available at github.com/cvi2snt/DDA-MLIC.

Paper Structure

This paper contains 35 sections, 16 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The work of daln cannot be directly applied to MLIC due to the differences between the two tasks dda-mlic: (a) Single-label image classification uses a softmax activation function to convert the predicted logits into probabilities such that the sum of all class probabilities is equal to one; and (b) on the other hand, multi-label image classification uses sigmoid activation where each logit is scaled between $0$ and $1$, giving higher probability values for the objects present in an image.
  • Figure 2: Histogram of classifier predictions. Predicted probabilities using source-only trained classifier on: (a) source dataset $(\mathcal{I}_s)$, and (b) target dataset $(\mathcal{I}_t)$.
  • Figure 3: (a) The classifier predictions $\mathbf{z}_s$ and $\mathbf{z}_t$ for both source and target datasets, respectively, can be grouped into two clusters. Hence, a two-component GMM can be fitted for both source ($\hat{P}_s$) and target ($\hat{P}_t$). While the first component is close to 0, the second is close to 1, (b) A component-wise comparison between source ($\hat{P}_s^1, \hat{P}_s^2$) and target ($\hat{P}_t^1, \hat{P}_t^2$) Gaussians of distributions extracted from the fitted GMM confirms that target predictions are likely to be farther from 0 and 1 with a higher standard deviation than the source.
  • Figure 4: The overall architecture of DDA-MLIC with the proposed DeepEM block consists of the following components: The feature extractor ($f_g$) learns discriminative features from both source and target images. The task classifier ($f_c$) performs two actions simultaneously: 1) it learns to accurately classify source samples using a supervised task loss $\mathcal{L}_{cls}(\mathcal{D}_s)$, and 2) when acting as a discriminator, it aims to minimize the proposed GMM-based discrepancy $\mathcal{L}_{adv}(\mathcal{D}_s, \mathcal{I}_t)$ between source $(\mathbf{z}_s)$ and target $(\mathbf{z}_t)$ predictions using the proposed DeepEM block, while $f_g$ simultaneously works to maximize it.
  • Figure 5: Comparison of the average training time per batch with and without DeepEM.
  • ...and 1 more figures