Table of Contents
Fetching ...

Mitigating Overfitting in Medical Imaging: Self-Supervised Pretraining vs. ImageNet Transfer Learning for Dermatological Diagnosis

Iván Matas, Carmen Serrano, Miguel Nogales, David Moreno, Lara Ferrándiz, Teresa Ojeda, Begoña Acha

TL;DR

This work addresses overfitting and domain mismatch when using ImageNet pretraining for dermatology. It compares a self-supervised, domain-specific pretraining via a variational autoencoder (VAE) with a randomly initialized ConvNext-Tiny encoder against a traditional ImageNet-pretrained backbone, using an identical classifier and a dermatoscopic dataset augmented with ISIC data. Results show the ImageNet pathway converges quickly but overfits to non-clinical features, whereas the self-supervised approach exhibits steady improvement and stronger generalization, suggesting domain-specific pretraining can achieve robust clinical performance with further tuning. The study highlights the importance of tailoring pretraining strategies to medical imaging tasks to enhance diagnostic support and reliability in real-world settings.

Abstract

Deep learning has transformed computer vision but relies heavily on large labeled datasets and computational resources. Transfer learning, particularly fine-tuning pretrained models, offers a practical alternative; however, models pretrained on natural image datasets such as ImageNet may fail to capture domain-specific characteristics in medical imaging. This study introduces an unsupervised learning framework that extracts high-value dermatological features instead of relying solely on ImageNet-based pretraining. We employ a Variational Autoencoder (VAE) trained from scratch on a proprietary dermatological dataset, allowing the model to learn a structured and clinically relevant latent space. This self-supervised feature extractor is then compared to an ImageNet-pretrained backbone under identical classification conditions, highlighting the trade-offs between general-purpose and domain-specific pretraining. Our results reveal distinct learning patterns. The self-supervised model achieves a final validation loss of 0.110 (-33.33%), while the ImageNet-pretrained model stagnates at 0.100 (-16.67%), indicating overfitting. Accuracy trends confirm this: the self-supervised model improves from 45% to 65% (+44.44%) with a near-zero overfitting gap, whereas the ImageNet-pretrained model reaches 87% (+50.00%) but plateaus at 75% (+19.05%), with its overfitting gap increasing to +0.060. These findings suggest that while ImageNet pretraining accelerates convergence, it also amplifies overfitting on non-clinically relevant features. In contrast, self-supervised learning achieves steady improvements, stronger generalization, and superior adaptability, underscoring the importance of domain-specific feature extraction in medical imaging.

Mitigating Overfitting in Medical Imaging: Self-Supervised Pretraining vs. ImageNet Transfer Learning for Dermatological Diagnosis

TL;DR

This work addresses overfitting and domain mismatch when using ImageNet pretraining for dermatology. It compares a self-supervised, domain-specific pretraining via a variational autoencoder (VAE) with a randomly initialized ConvNext-Tiny encoder against a traditional ImageNet-pretrained backbone, using an identical classifier and a dermatoscopic dataset augmented with ISIC data. Results show the ImageNet pathway converges quickly but overfits to non-clinical features, whereas the self-supervised approach exhibits steady improvement and stronger generalization, suggesting domain-specific pretraining can achieve robust clinical performance with further tuning. The study highlights the importance of tailoring pretraining strategies to medical imaging tasks to enhance diagnostic support and reliability in real-world settings.

Abstract

Deep learning has transformed computer vision but relies heavily on large labeled datasets and computational resources. Transfer learning, particularly fine-tuning pretrained models, offers a practical alternative; however, models pretrained on natural image datasets such as ImageNet may fail to capture domain-specific characteristics in medical imaging. This study introduces an unsupervised learning framework that extracts high-value dermatological features instead of relying solely on ImageNet-based pretraining. We employ a Variational Autoencoder (VAE) trained from scratch on a proprietary dermatological dataset, allowing the model to learn a structured and clinically relevant latent space. This self-supervised feature extractor is then compared to an ImageNet-pretrained backbone under identical classification conditions, highlighting the trade-offs between general-purpose and domain-specific pretraining. Our results reveal distinct learning patterns. The self-supervised model achieves a final validation loss of 0.110 (-33.33%), while the ImageNet-pretrained model stagnates at 0.100 (-16.67%), indicating overfitting. Accuracy trends confirm this: the self-supervised model improves from 45% to 65% (+44.44%) with a near-zero overfitting gap, whereas the ImageNet-pretrained model reaches 87% (+50.00%) but plateaus at 75% (+19.05%), with its overfitting gap increasing to +0.060. These findings suggest that while ImageNet pretraining accelerates convergence, it also amplifies overfitting on non-clinically relevant features. In contrast, self-supervised learning achieves steady improvements, stronger generalization, and superior adaptability, underscoring the importance of domain-specific feature extraction in medical imaging.

Paper Structure

This paper contains 11 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed self-supervised learning framework. (I) The Variational AutoEncoder is trained in an unsupervised manner, using a randomly initialized ConvNext-Tiny encoder to extract robust feature representations. (II) The trained encoder is then frozen and used as a feature extractor for a classification task. A parallel comparison is conducted using a ConvNext-Tiny encoder pretrained on ImageNet. Both feature extractors feed into identical classifier architectures, and their performance is evaluated using the same metrics.
  • Figure 2: Training dynamics of the self-supervised model (Model A) and the ImageNet-pretrained model (Model B). The left side displays the loss and accuracy evolution of Model A, while the right side shows the corresponding trends for Model B. The top panels illustrate the loss progression, while the bottom panels depict the accuracy evolution.