Table of Contents
Fetching ...

Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment

Haisheng Lu, Yujie Fu, Fan Zhang, Le Zhang

TL;DR

This work tackles the high inference cost of MedSAM in medical image segmentation by developing a quantization‑aware training pipeline (QAT) to quantize LiteMedSAM’s image encoder and mask decoder to 8‑bit while keeping the prompt encoder FP, followed by deployment on the OpenVINO CPU engine. The approach uses Brevitas for QAT, a three‑stage training strategy with teacher distillation and end‑to‑end fine‑tuning, and exports via the QCDQ ONNX pathway to maximize compatibility with OpenVINO. Results show substantial speedups on CPU with OpenVINO and competitive accuracy, along with improved modality balance across 11 medical imaging modalities. The work provides practical, storage‑efficient, and hardware‑friendly segmentation for clinical and research use, with code publicly available for reproducibility and extension.

Abstract

Medical image segmentation is a critical component of clinical practice, and the state-of-the-art MedSAM model has significantly advanced this field. Nevertheless, critiques highlight that MedSAM demands substantial computational resources during inference. To address this issue, the CVPR 2024 MedSAM on Laptop Challenge was established to find an optimal balance between accuracy and processing speed. In this paper, we introduce a quantization-aware training pipeline designed to efficiently quantize the Segment Anything Model for medical images and deploy it using the OpenVINO inference engine. This pipeline optimizes both training time and disk storage. Our experimental results confirm that this approach considerably enhances processing speed over the baseline, while still achieving an acceptable accuracy level. The training script, inference script, and quantized model are publicly accessible at https://github.com/AVC2-UESTC/QMedSAM.

Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment

TL;DR

This work tackles the high inference cost of MedSAM in medical image segmentation by developing a quantization‑aware training pipeline (QAT) to quantize LiteMedSAM’s image encoder and mask decoder to 8‑bit while keeping the prompt encoder FP, followed by deployment on the OpenVINO CPU engine. The approach uses Brevitas for QAT, a three‑stage training strategy with teacher distillation and end‑to‑end fine‑tuning, and exports via the QCDQ ONNX pathway to maximize compatibility with OpenVINO. Results show substantial speedups on CPU with OpenVINO and competitive accuracy, along with improved modality balance across 11 medical imaging modalities. The work provides practical, storage‑efficient, and hardware‑friendly segmentation for clinical and research use, with code publicly available for reproducibility and extension.

Abstract

Medical image segmentation is a critical component of clinical practice, and the state-of-the-art MedSAM model has significantly advanced this field. Nevertheless, critiques highlight that MedSAM demands substantial computational resources during inference. To address this issue, the CVPR 2024 MedSAM on Laptop Challenge was established to find an optimal balance between accuracy and processing speed. In this paper, we introduce a quantization-aware training pipeline designed to efficiently quantize the Segment Anything Model for medical images and deploy it using the OpenVINO inference engine. This pipeline optimizes both training time and disk storage. Our experimental results confirm that this approach considerably enhances processing speed over the baseline, while still achieving an acceptable accuracy level. The training script, inference script, and quantized model are publicly accessible at https://github.com/AVC2-UESTC/QMedSAM.

Paper Structure

This paper contains 20 sections, 2 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Common quantized sub-layers. (a) quantized linear layer; (b) quantized convolutional layer; (c) quantized attention block. Circles in the figure represent corresponding calculations: M stands for matrix multiplication, C stands for convolution, and T stands for transpose. Operations involving quantization are represented by round rectangles in the figure. The inputs and output of all the sub-layers depicted in the figure are floating-point tensors.
  • Figure 2: Good segmentation results. (a) Image and box; (b) Ground truth; (c) Baseline; (d) Proposed method.
  • Figure 3: Bad segmentation results. (a) Image and box; (b) Ground truth; (c) Baseline; (d) Proposed method.