The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

Sameer Ambekar; Reza Nasirigerdeh; Peter J. Schuffler; Lina Felsner; Daniel M. Lang; Julia A. Schnabel

The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

Sameer Ambekar, Reza Nasirigerdeh, Peter J. Schuffler, Lina Felsner, Daniel M. Lang, Julia A. Schnabel

TL;DR

This work introduces an entropy-adaptive, fully online model-merging method that yields a batch-specific merged model via only forward passes, effectively leveraging target information, and demonstrates why mean merging is prone to failure and misaligned under heterogeneous domain shifts.

Abstract

Model merging under unseen test-time distribution shifts often renders naive strategies, such as mean averaging unreliable. This challenge is especially acute in medical imaging, where models are fine-tuned locally at clinics on private data, producing domain-specific models that differ by scanner, protocol, and population. When deployed at an unseen clinical site, test cases arrive in unlabeled, non-i.i.d. batches, and the model must adapt immediately without labels. In this work, we introduce an entropy-adaptive, fully online model-merging method that yields a batch-specific merged model via only forward passes, effectively leveraging target information. We further demonstrate why mean merging is prone to failure and misaligned under heterogeneous domain shifts. Next, we mitigate encoder classifier mismatch by decoupling the encoder and classification head, merging with separate merging coefficients. We extensively evaluate our method with state-of-the-art baselines using two backbones across nine medical and natural-domain generalization image classification datasets, showing consistent gains across standard evaluation and challenging scenarios. These performance gains are achieved while retaining single-model inference at test-time, thereby demonstrating the effectiveness of our method.

The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

TL;DR

Abstract

Paper Structure (39 sections, 9 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 39 sections, 9 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Related work
Background and Notations
Methodology
Why mean merging is prone to failure and misalignment
Entropy-adaptive merging
Decoupling encoder and classification head coefficients
Encoder coefficients.
Classification head coefficients.
Experiments
Additional experiments and Benefits
Conclusion and Outlook
Additional Datasets details
Additional Related work
Additional Implementation details
...and 24 more sections

Figures (5)

Figure 1: Mean merging and Entropy-Adaptive merging (Ours).(a) Mean merging averages independently trained hospital models into a static model that doesn't consider the target information into account, thus can fail on mixed target batches. (b) Our method, adaptively, uses entropy for each unlabeled target batch to calculate per-batch merging coefficients along linear mode connectivity directions, producing batch-specific merged models while leveraging target information. Additional illustration with loss heatmaps in Fig.\ref{['fig:weight_spaces']}.
Figure 2: Why Mean Merging fails for the PACS dataset using ViT-B32 domain-specific models trained on Photo, Cartoon, and Sketch domains and evaluated on Art domain. We first compute the angles as provided in jang2024model between weights and bias terms of domain-specific trained models, then compute directional and angular drift to obtain deeper insights. (A) Data diversity: Model parameters drift in different directions and scales across domains, with the classification head showing the strongest misalignment. (B) Encoder-classification head mismatch:Mean angular layerwise drift. Misalignment increases with network depth and becomes most pronounced at the classification head. (C) Fixed mean merging is prone to failure: Mean merging leads to substantial signal loss in the classification head compared with individual models.
Figure 3: Illustration of the loss landscape. (a) Mean merging: The linear path between models crosses a high-loss barrier because straightforward mean connectivity assumes compatible representations across all the layers. (b) Our method learns adaptive merging coefficients that follow a lower-loss path, resolving feature incompatibility near the unseen target optimum.
Figure 4: Consistent improvements under Dirichlet and Temporal sampling. Mean accuracy (%) on PACS and Organs under Dirichlet splits (0.05, 0.50) and Temporally correlated data. Our Entropy-Adaptive (green) consistently outperforms the baselines with larger gains under a stronger skew factor ($\alpha=0.05$).
Figure 5: Layer-wise domain-specific models misalignment. Pairwise parameter angles between domain-specific models across ViT and classification head components show that encoder layers are largely aligned, while the classification head (C) is strongly misaligned, motivating head-aware merging.

The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

TL;DR

Abstract

The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

Authors

TL;DR

Abstract

Table of Contents

Figures (5)