Training-Free Model Merging for Multi-target Domain Adaptation

Wenyi Li; Huan-ang Gao; Mingju Gao; Beiwen Tian; Rong Zhi; Hao Zhao

Training-Free Model Merging for Multi-target Domain Adaptation

Wenyi Li, Huan-ang Gao, Mingju Gao, Beiwen Tian, Rong Zhi, Hao Zhao

TL;DR

The paper tackles multi-target domain adaptation under data-access restrictions by proposing a training-free model merging framework that combines independently adapted models. It shows that parameter merging along a linear path and merging batch-normalization buffers via a Gaussian-prior model can yield a single robust model without accessing training data, with linear mode connectivity facilitated by shared pretrained weights. The approach achieves competitive, and often superior, harmonic-mean performance compared to data-availability baselines and state-of-the-art MTDA methods, across multiple backbones and target domains. This work highlights the practical potential of data-free domain adaptation and BN-statistics-aware merging to reduce bandwidth and privacy concerns while maintaining strong cross-domain robustness.

Abstract

In this paper, we study multi-target domain adaptation of scene understanding models. While previous methods achieved commendable results through inter-domain consistency losses, they often assumed unrealistic simultaneous access to images from all target domains, overlooking constraints such as data transfer bandwidth limitations and data privacy concerns. Given these challenges, we pose the question: How to merge models adapted independently on distinct domains while bypassing the need for direct access to training data? Our solution to this problem involves two components, merging model parameters and merging model buffers (i.e., normalization layer statistics). For merging model parameters, empirical analyses of mode connectivity surprisingly reveal that linear merging suffices when employing the same pretrained backbone weights for adapting separate models. For merging model buffers, we model the real-world distribution with a Gaussian prior and estimate new statistics from the buffers of separately trained models. Our method is simple yet effective, achieving comparable performance with data combination training baselines, while eliminating the need for accessing training data. Project page: https://air-discover.github.io/ModelMerging

Training-Free Model Merging for Multi-target Domain Adaptation

TL;DR

Abstract

Paper Structure (22 sections, 5 equations, 8 figures, 9 tables)

This paper contains 22 sections, 5 equations, 8 figures, 9 tables.

Introduction
Related Works
Domain Adaptation for Semantic Segmentation
Multi-target Learning with Constrained Data Assumption
Mode Connectivity for Neural Networks
Methodology
Overview
Merging Parameters
Merging Buffers
Experiments
Datasets
Implementation Details
Comparison with Baseline Methods
Comparison with State-of-the-Arts
Extending to More Target Domains
...and 7 more sections

Figures (8)

Figure 1: Comparison of Domain Adaptation Settings. (a) Single Target Domain Adaptation (STDA) focuses on leveraging labeled synthetic data and unlabeled data from a single target domain together for optimal performance in that target domain. (b) Multi-target Domain Adaptation (MTDA) with data access involves utilizing data from target domains together to train a single model capable of excelling across all these domains. (c) MTDA without direct access to training data, employing model merging to enhance robustness.
Figure 2: Overview of Two-stage Pipeline of Our Proposed Multi-target Domain Adaptation Solution. After training STDA methods on separate domains, we integrate models together using our proposed merging techniques.
Figure 3: Results of Git Re-Basin and Mid-Point Merging on Different Backbones. In our domain adaptation scenario, Git Re-Basin ainsworth2022git reduced to a straightforward mid-point merging approach.
Figure 4: Empirical Analysis for Linear Mode Connectivity. (a) Exploring the linear mode connectivity of two trained ResNet101 backbones targeted at two different domains. (b-e) Ablation studies on synthetic data, self-training architecture, initializaiton weights and pretrained weights to find the cause of the linear mode connectivity.
Figure 5: Illustration on Merging Statistics in Batch Normalization (BN) Layers.
...and 3 more figures

Training-Free Model Merging for Multi-target Domain Adaptation

TL;DR

Abstract

Training-Free Model Merging for Multi-target Domain Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)