Table of Contents
Fetching ...

Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks

Roberto Alcover-Couso, Juan C. SanMiguel, Marcos Escudero-Viñolo, Jose M Martínez

TL;DR

The paper tackles the inefficiency of teacher-student and ensemble approaches in unsupervised domain adaptation for segmentation by proposing a cost-free layer-wise model merging method. It introduces an anchor-based, layer-wise weighting scheme that uniformly merges initial backbone layers while biasing final layers toward the anchor, enabling cross-task and cross-architecture knowledge fusion with no extra training or inference cost. The approach is extensively validated across semantic and panoptic segmentation tasks, multiple UDA strategies, and datasets, achieving notable gains such as up to +2.6% mIoU for same-architecture merges, up to +6.8% mIoU for different-architecture merges with a shared backbone, and up to +7% mPQ for cross-task semantic to panoptic merging. These results demonstrate the practical potential of cost-free, checkpoint-based layer-wise merging to enhance robustness and performance in UDA without additional compute, encouraging broader adoption in segmentation and beyond.

Abstract

Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a layer-wise integration of merged models, aiming to maintain the distinctiveness of the task-specific final layers while unifying the initial layers, which are primarily associated with feature extraction. This approach ensures parameter consistency across all layers, essential for boosting performance. Moreover, it facilitates seamless integration of knowledge, enabling effective merging of models from different datasets and tasks. Specifically, we investigate its applicability in Unsupervised Domain Adaptation (UDA), an unexplored area for model merging, for Semantic and Panoptic Segmentation. Experimental results demonstrate substantial UDA improvements without additional costs for merging same-architecture models from distinct datasets ($\uparrow 2.6\%$ mIoU) and different-architecture models with a shared backbone ($\uparrow 6.8\%$ mIoU). Furthermore, merging Semantic and Panoptic Segmentation models increases mPQ by $\uparrow 7\%$. These findings are validated across a wide variety of UDA strategies, architectures, and datasets.

Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks

TL;DR

The paper tackles the inefficiency of teacher-student and ensemble approaches in unsupervised domain adaptation for segmentation by proposing a cost-free layer-wise model merging method. It introduces an anchor-based, layer-wise weighting scheme that uniformly merges initial backbone layers while biasing final layers toward the anchor, enabling cross-task and cross-architecture knowledge fusion with no extra training or inference cost. The approach is extensively validated across semantic and panoptic segmentation tasks, multiple UDA strategies, and datasets, achieving notable gains such as up to +2.6% mIoU for same-architecture merges, up to +6.8% mIoU for different-architecture merges with a shared backbone, and up to +7% mPQ for cross-task semantic to panoptic merging. These results demonstrate the practical potential of cost-free, checkpoint-based layer-wise merging to enhance robustness and performance in UDA without additional compute, encouraging broader adoption in segmentation and beyond.

Abstract

Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a layer-wise integration of merged models, aiming to maintain the distinctiveness of the task-specific final layers while unifying the initial layers, which are primarily associated with feature extraction. This approach ensures parameter consistency across all layers, essential for boosting performance. Moreover, it facilitates seamless integration of knowledge, enabling effective merging of models from different datasets and tasks. Specifically, we investigate its applicability in Unsupervised Domain Adaptation (UDA), an unexplored area for model merging, for Semantic and Panoptic Segmentation. Experimental results demonstrate substantial UDA improvements without additional costs for merging same-architecture models from distinct datasets ( mIoU) and different-architecture models with a shared backbone ( mIoU). Furthermore, merging Semantic and Panoptic Segmentation models increases mPQ by . These findings are validated across a wide variety of UDA strategies, architectures, and datasets.
Paper Structure (17 sections, 1 equation, 6 figures, 9 tables)

This paper contains 17 sections, 1 equation, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Performance comparison for the UDA Synthia-to-Cityscapes setup. Most state-of-the-art UDA methods build up from DAFormer hoyer2022daformer by adding computationally expensive blocks. In contrast, we leverage the obtained checkpoints during training to improve performance without incurring any associated training or inference overhead.
  • Figure 2: Layer-wise discrepancy between models under two scenarios. The Checkpoint training scenario assumes an available training process with four checkpoints in total, and compares the first checkpoint against the other three. The Heterogeneous training scenario considers the availability of two models trained with different UDA strategies (discrepancy and adversarial, as defined by vu2019advent), and compares the best trained model with each method. The parameters considered include both convolutional (weights and biases) and batch normalization ones (mean and variance). All results are based on training the DeeplabV2 architecture 7913730 for the GTA-to-Cityscapes setup. For visualization purposes, we have included a dotted line indicating the last backbone parameter and first segmentation head parameter.
  • Figure 3: Performance comparison of model merging methods for different checkpoints of convolutional models on the GTA-to-Cityscapes setup. A single checkpoint represents the final performance (i.e., final checkpoint) of the UDA training strategy vu2019advent with the DeepLabV2 architecture 7913730. Checkpoints greater than one consider merging the final checkpoint with an increasing number of checkpoints.
  • Figure 4: Qualitative comparison of state-of-the-art model MIC hoyer2023mic on the GTA-to-Cityscapes UDA semantic segmentation setup for hard samples of coarse classes. Specifically, the sidewalk on a similar color than the road (first two rows), and a stone paved road (other four rows). Ours stands for the model merging of MIC checkpoints, note that our model is cost-free in terms of training time and inference time compared to the MIC model. Each row presents the color image, the ground-truth labels, the segmentation result of MIC and our segmentation result (by columns).
  • Figure 5: In-depth analysis of Layer-wise merging of an adversarial Wang2020 and an entropy minimization vu2019advent method on a DeepLabV2 architecture 7913730.
  • ...and 1 more figures