Efficient Knowledge Distillation of SAM for Medical Image Segmentation
Kunal Dasharath Patil, Gowthamaan Palani, Ganapathy Krishnamurthi
TL;DR
The paper targets the high computational cost of SAM in medical image segmentation and proposes KD SAM, a decoupled knowledge-distillation approach that transfers knowledge from SAM's ViT-H encoder to a lightweight ResNet-50 encoder using a dual loss of $L_{MSE}$ and $L_{P}$, followed by decoder fine-tuning with Dice Loss. The method achieves comparable or superior segmentation accuracy to SAM and MobileSAM on diverse medical datasets while drastically reducing parameters to 26.4M. This efficiency enables real-time medical image segmentation in resource-constrained settings. The work demonstrates that encoder-decoder decoupled distillation with perceptual guidance can preserve high-detail segmentation with much smaller models.
Abstract
The Segment Anything Model (SAM) has set a new standard in interactive image segmentation, offering robust performance across various tasks. However, its significant computational requirements limit its deployment in real-time or resource-constrained environments. To address these challenges, we propose a novel knowledge distillation approach, KD SAM, which incorporates both encoder and decoder optimization through a combination of Mean Squared Error (MSE) and Perceptual Loss. This dual-loss framework captures structural and semantic features, enabling the student model to maintain high segmentation accuracy while reducing computational complexity. Based on the model evaluation on datasets, including Kvasir-SEG, ISIC 2017, Fetal Head Ultrasound, and Breast Ultrasound, we demonstrate that KD SAM achieves comparable or superior performance to the baseline models, with significantly fewer parameters. KD SAM effectively balances segmentation accuracy and computational efficiency, making it well-suited for real-time medical image segmentation applications in resource-constrained environments.
