Table of Contents
Fetching ...

Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation

Wenqiang Zu, Shenghao Xie, Hao Chen, Lei Ma

TL;DR

This work addresses why pre-trained models retain effectiveness in medical imaging despite cross-domain gaps by introducing a similarity-space framework that tracks representation similarity $M_X(\theta^*, \theta)$ and accuracy divergence $A_X(\theta^*, \theta)$ during fine-tuning. It evaluates three backbones (VIT, OPENCLIP, DINOv2) across multiple medical datasets using metrics such as $CKA$, $ECE$, and k-NN, formulating a formal space $T_X(\eta, \epsilon)$ to characterize viable fine-tuned parameters. The key findings show that high-performing models can maintain accuracy while undergoing substantial similarity changes, there is a robust linear relationship between similarity and representation quality metrics, and supervised pre-training generally resists forgetting better than self-supervised pre-training. These insights have practical implications for transfer-learning strategies beyond medical imaging, enabling predictive model selection and more principled fine-tuning of pre-trained representations.

Abstract

This paper investigates the critical problem of representation similarity evolution during cross-domain transfer learning, with particular focus on understanding why pre-trained models maintain effectiveness when adapted to medical imaging tasks despite significant domain gaps. The study establishes a rigorous problem definition centered on quantifying and analyzing representation similarity trajectories throughout the fine-tuning process, while carefully delineating the scope to encompass both medical image analysis and broader cross-domain adaptation scenarios. Our empirical findings reveal three critical discoveries: the potential existence of high-performance models that preserve both task accuracy and representation similarity to their pre-trained origins; a robust linear correlation between layer-wise similarity metrics and representation quality indicators; and distinct adaptation patterns that differentiate supervised versus self-supervised pre-training paradigms. The proposed similarity space framework not only provides mechanistic insights into knowledge transfer dynamics but also raises fundamental questions about optimal utilization of pre-trained models. These results advance our understanding of neural network adaptation processes while offering practical implications for transfer learning strategies that extend beyond medical imaging applications. The code will be available once accepted.

Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation

TL;DR

This work addresses why pre-trained models retain effectiveness in medical imaging despite cross-domain gaps by introducing a similarity-space framework that tracks representation similarity and accuracy divergence during fine-tuning. It evaluates three backbones (VIT, OPENCLIP, DINOv2) across multiple medical datasets using metrics such as , , and k-NN, formulating a formal space to characterize viable fine-tuned parameters. The key findings show that high-performing models can maintain accuracy while undergoing substantial similarity changes, there is a robust linear relationship between similarity and representation quality metrics, and supervised pre-training generally resists forgetting better than self-supervised pre-training. These insights have practical implications for transfer-learning strategies beyond medical imaging, enabling predictive model selection and more principled fine-tuning of pre-trained representations.

Abstract

This paper investigates the critical problem of representation similarity evolution during cross-domain transfer learning, with particular focus on understanding why pre-trained models maintain effectiveness when adapted to medical imaging tasks despite significant domain gaps. The study establishes a rigorous problem definition centered on quantifying and analyzing representation similarity trajectories throughout the fine-tuning process, while carefully delineating the scope to encompass both medical image analysis and broader cross-domain adaptation scenarios. Our empirical findings reveal three critical discoveries: the potential existence of high-performance models that preserve both task accuracy and representation similarity to their pre-trained origins; a robust linear correlation between layer-wise similarity metrics and representation quality indicators; and distinct adaptation patterns that differentiate supervised versus self-supervised pre-training paradigms. The proposed similarity space framework not only provides mechanistic insights into knowledge transfer dynamics but also raises fundamental questions about optimal utilization of pre-trained models. These results advance our understanding of neural network adaptation processes while offering practical implications for transfer learning strategies that extend beyond medical imaging applications. The code will be available once accepted.

Paper Structure

This paper contains 11 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: (a) The k-NN evaluation results on ImageNet-1K between fine-tuned models (fine-tuned on ISIC2018, blue bar) and pre-trained models (gray bar). (b) Similarity distribution curves for models fine-tuned on ISIC2018 dataset codella2019skin (20 runs/method), recording final representation similarity with pre-trained models.
  • Figure 2: Similarity space: (a) Fine-tuned models show varied similarity levels in pre-trained parameter space. (b) Models within the similarity space demonstrate both performance and similarity.
  • Figure 3: Attention maps of pre-trained and fine-tuned models are visualized using images from ImageNet-1K. Fine-tuning is performed on the ISIC2018 dataset, with CLS indicating the CLS token and Feat denoting average patch features.
  • Figure 4: Accuracy and similarity results of fine-tuned models on ISIC2018 dataset for 20 runs. (a) The average training trajectory of 20 epochs across the 20 runs. (b) The distribution of the final similarity and accuracy of these fine-tuned models.
  • Figure 5: The linear correlation between similarity degree (CKA) and representation metrics (ECE) of models fine-tuned on ISIC2018 is analyzed. Similarity degrees are discretized (black points), and linear correlation is computed on interval averages (red points), with each model’s similarity assigned to the nearest interval.