Table of Contents
Fetching ...

EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis

Shaojie Li, Zhaoshuo Diao

TL;DR

The experimental results demonstrate that the EFCM framework significantly improves accuracy and efficiency in handling slide-level pathological image problems, effectively addressing the challenges of deploying large medical models.

Abstract

The recent development of deep learning large models in medicine shows remarkable performance in medical image analysis and diagnosis, but their large number of parameters causes memory and inference latency challenges. Knowledge distillation offers a solution, but the slide-level gradients cannot be backpropagated for student model updates due to high-resolution pathological images and slide-level labels. This study presents an Efficient Fine-tuning on Compressed Models (EFCM) framework with two stages: unsupervised feature distillation and fine-tuning. In the distillation stage, Feature Projection Distillation (FPD) is proposed with a TransScan module for adaptive receptive field adjustment to enhance the knowledge absorption capability of the student model. In the slide-level fine-tuning stage, three strategies (Reuse CLAM, Retrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are conducted on 11 downstream datasets related to three large medical models: RETFound for retina, MRM for chest X-ray, and BROW for histopathology. The experimental results demonstrate that the EFCM framework significantly improves accuracy and efficiency in handling slide-level pathological image problems, effectively addressing the challenges of deploying large medical models. Specifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC compared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The analysis of model inference efficiency highlights the high efficiency of the distillation fine-tuning method.

EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis

TL;DR

The experimental results demonstrate that the EFCM framework significantly improves accuracy and efficiency in handling slide-level pathological image problems, effectively addressing the challenges of deploying large medical models.

Abstract

The recent development of deep learning large models in medicine shows remarkable performance in medical image analysis and diagnosis, but their large number of parameters causes memory and inference latency challenges. Knowledge distillation offers a solution, but the slide-level gradients cannot be backpropagated for student model updates due to high-resolution pathological images and slide-level labels. This study presents an Efficient Fine-tuning on Compressed Models (EFCM) framework with two stages: unsupervised feature distillation and fine-tuning. In the distillation stage, Feature Projection Distillation (FPD) is proposed with a TransScan module for adaptive receptive field adjustment to enhance the knowledge absorption capability of the student model. In the slide-level fine-tuning stage, three strategies (Reuse CLAM, Retrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are conducted on 11 downstream datasets related to three large medical models: RETFound for retina, MRM for chest X-ray, and BROW for histopathology. The experimental results demonstrate that the EFCM framework significantly improves accuracy and efficiency in handling slide-level pathological image problems, effectively addressing the challenges of deploying large medical models. Specifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC compared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The analysis of model inference efficiency highlights the high efficiency of the distillation fine-tuning method.
Paper Structure (26 sections, 11 equations, 7 figures, 11 tables)

This paper contains 26 sections, 11 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: The framework of EFCM for slide-level pathology images. Stage 1: Extract tissue regions from the WSI and perform patch extraction within these regions. Stage 2: Utilize a large pre-trained model as the teacher model to guide knowledge transfer to the student model through distillation. Stage 3: Employ instance features extracted by the teacher model to train the Information Bottleneck (IB) module for generating instance masks, filtering a restricted number of instance samples per WSI. Stage 4: Fine-tune the distilled student model end-to-end, and then use the fine-tuned student model as a feature extractor to extract features from all instance samples to further train a new CLAM classifier.
  • Figure 2: Comparison of Vanilla Feature Distillation (VFD) and Feature Projection Distillation (FPD). The main differences are in the student model design and how the student model parameters are updated. (a) In VFD, the student model parameters are updated collectively. (b) In our FPD, we freeze the shallow CNN and solely update only the projection parameters.
  • Figure 3: The performance of two distillation models is compared on pathology image datasets using three fine-tuning strategies. The VFD method is represented by sky blue, the FPD method by salmon, and the metrics of the large model on each dataset are depicted by a light gray dashed line.
  • Figure 4: Visualization of the results for some cases. These cases are from the TCGA-NSCLC dataset. The first column of images represents the real situation of the lesion area marked with a blue line, with red rectangles indicating local ROIs highlighting the boundary between the tumor and normal tissue. Columns 2 to 4 display the ROIs of the large model, VFD, and FPD methods predicting the attentional heatmap. Warmer colors in the attentional heatmap indicate a higher probability of estimating tumor tissue.
  • Figure 5: Evaluate the transferability of distillation models using the ETC fine-tuning strategy. The assessment of distillation model transferability across datasets is evaluated through fine-tuning. The VFD_ETC method is represented by sky blue, and the FPD_ETC method is represented by salmon.
  • ...and 2 more figures