Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks

Roberto Alcover-Couso; Marcos Escudero-Viñolo; Juan C. SanMiguel; Jesus Bescós

Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks

Roberto Alcover-Couso, Marcos Escudero-Viñolo, Juan C. SanMiguel, Jesus Bescós

TL;DR

The paper tackles the problem of severe class imbalance in unsupervised domain adaptation for dense prediction tasks, where domain shifts between synthetic sources and real targets skew learning toward frequent classes. It introduces Gradient-based class weighting (GBW), which dynamically computes per-class weights from gradient magnitudes of per-class losses and solves a constrained quadratic program to obtain nonnegative weights that sum to the number of classes; weights can be updated at every training step and integrated with pseudo-label weighting. GBW yields consistent recall gains for underrepresented classes and improves overall performance across semantic and panoptic segmentation, using both CNN and transformer architectures and multiple UDA strategies (adversarial, self-training, entropy minimization) without requiring target priors. The approach also complements data-level imbalance techniques, providing a practical building block to narrow the gap between UDA and supervised performance in dense vision tasks.

Abstract

In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite considerable progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imbalanced dense prediction visual tasks like semantic and panoptic segmentation. This discrepancy becomes especially pronounced due to the lack of equivalent priors between the source and target domains, turning class imbalanced techniques used for other areas (e.g., image classification) ineffective in UDA scenarios. This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses, but with the novelty of estimating these weights dynamically through the loss gradient, defining a Gradient-based class weighting (GBW) learning. GBW naturally increases the contribution of classes whose learning is hindered by large-represented classes, and has the advantage of being able to automatically and quickly adapt to the iteration training outcomes, avoiding explicitly curricular learning patterns common in loss-weighing strategies. Extensive experimentation validates the effectiveness of GBW across architectures (convolutional and transformer), UDA strategies (adversarial, self-training and entropy minimization), tasks (semantic and panoptic segmentation), and datasets (GTA and Synthia). Analysing the source of advantage, GBW consistently increases the recall of low represented classes.

Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks

TL;DR

Abstract

Paper Structure (25 sections, 12 equations, 6 figures, 5 tables)

This paper contains 25 sections, 12 equations, 6 figures, 5 tables.

Introduction
Related Work
Unsupervised Domain Adaptation: UDA
Handling class imbalance in UDA
Method
The learning of dense prediction visual tasks
Gradient based weighting (GBW)
Combination with previous per-sample weighting
Experimental Exploration
Setup
Semantic Segmentation
Panoptic Segmentation
Training parameters
GBW for UDA in Semantic Segmentation
Incorporation of GBW into UDA methods
...and 10 more sections

Figures (6)

Figure 1: UDA driven by MIC hoyer2023mic ($1^{st}$ row) and EDAPS edaps ($2^{nd}$ and $3^{rd}$ rows) methods is biased toward more populated (frequent or larger) classes on the source dataset (c), miss-classifying instances of less frequent or smaller classes: false positives examples include instances of train, car and person miss-classified as truck, road and car respectively. (d) GBW improves the classification of under-represented classes.
Figure 2: Averaged per-class weights ($v_c$) of GBW on the GTA-Cityscapes framework for HRDA method throughout the UDA training process. \ref{['subfig_Coarse']} shows the evolution for some coarse classes and \ref{['subfig_ped']} depicts the complementary evolution for person and rider.
Figure 3: Semantic segmentation using GBW. Given a forward pass of an image and the respective per-class loss ($l_c$). First, the gradients ($||\nabla_{\theta_t} \overline{l_c}||^2$) are estimated with the gradient of the last layer (shadowed in orange) to compute the per-class weights $\mathbf{v}$. Second, the backward pass is performed wrt the weighted cross entropy $\mathcal{L}(\hat{\mathbf{y}}_{i,t},y_i;\mathbf{v}_t)$.
Figure 4: GBW per-class performance analysis.
Figure 5: Qualitative comparison. Column-wise: image, ground truth, model trained with and without GBW. Semantic segmentation ($1^{st}$ row). Panoptic segmentation ($2^{nd}$ row).
...and 1 more figures

Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks

TL;DR

Abstract

Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)