Comparison of fine-tuning strategies for transfer learning in medical image classification
Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
TL;DR
This study systematically compares eight fine-tuning strategies for transferring pre-trained CNNs to medical image classification across six datasets spanning X-ray, MRI, histology, dermoscopy, and endoscopy. Using three backbones—ResNet-50, DenseNet-121, and VGG-19—the authors show that no single method is universally optimal; linear probing is often weak, while LP-FT and Auto-RGN frequently yield robust gains, with DenseNet-121 particularly benefiting from non-standard fine-tuning approaches. Auto-RGN can deliver notable improvements, up to around 11% in some modalities, by dynamically adjusting per-layer learning rates. The findings provide practical guidance for practitioners seeking effective transfer learning strategies in diverse medical imaging tasks and point to opportunities for expanding analyses with additional architectures and fine-tuning methods.
Abstract
In the context of medical imaging and machine learning, one of the most pressing challenges is the effective adaptation of pre-trained models to specialized medical contexts. Despite the availability of advanced pre-trained models, their direct application to the highly specialized and diverse field of medical imaging often falls short due to the unique characteristics of medical data. This study provides a comprehensive analysis on the performance of various fine-tuning methods applied to pre-trained models across a spectrum of medical imaging domains, including X-ray, MRI, Histology, Dermoscopy, and Endoscopic surgery. We evaluated eight fine-tuning strategies, including standard techniques such as fine-tuning all layers or fine-tuning only the classifier layers, alongside methods such as gradually unfreezing layers, regularization based fine-tuning and adaptive learning rates. We selected three well-established CNN architectures (ResNet-50, DenseNet-121, and VGG-19) to cover a range of learning and feature extraction scenarios. Although our results indicate that the efficacy of these fine-tuning methods significantly varies depending on both the architecture and the medical imaging type, strategies such as combining Linear Probing with Full Fine-tuning resulted in notable improvements in over 50% of the evaluated cases, demonstrating general effectiveness across medical domains. Moreover, Auto-RGN, which dynamically adjusts learning rates, led to performance enhancements of up to 11% for specific modalities. Additionally, the DenseNet architecture showed more pronounced benefits from alternative fine-tuning approaches compared to traditional full fine-tuning. This work not only provides valuable insights for optimizing pre-trained models in medical image analysis but also suggests the potential for future research into more advanced architectures and fine-tuning methods.
