Table of Contents
Fetching ...

Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation

Risab Biswas

TL;DR

The work tackles efficient medical image segmentation under data and compute constraints by transferring knowledge from a large multi-task teacher to a compact student. It proposes a unified framework combining multi-scale feature distillation with supervised contrastive learning (InfoNCE) and a prediction maps distillation loss to align representations across encoder, bottleneck, and decoder scales. Experiments on spleen CT segmentation demonstrate that encoder-to-encoder distillation with contrastive learning yields the strongest gains, enabling a smaller model trained on half the data to approach or surpass the performance of larger baselines, with statistical significance. The approach offers practical benefits for deploying accurate MIS models in resource-limited clinical settings by reducing data and computation requirements without sacrificing performance.

Abstract

This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks, specifically focusing on the transfer from a larger multi-task "Teacher" network to a smaller "Student" network. In the context of medical imaging, where the data volumes are often limited, leveraging knowledge from a larger pre-trained network could be useful. The primary objective is to enhance the performance of a smaller student model by incorporating knowledge representations acquired by a teacher model that adopts a multi-task pre-trained architecture trained on CT images, to a more resource-efficient student network, which can essentially be a smaller version of the same, trained on a mere 50% of the data than that of the teacher model. To facilitate knowledge transfer between the two models, we devised an architecture incorporating multi-scale feature distillation and supervised contrastive learning. Our study aims to improve the student model's performance by integrating knowledge representations from the teacher model. We investigate whether this approach is particularly effective in scenarios with limited computational resources and limited training data availability. To assess the impact of multi-scale feature distillation, we conducted extensive experiments. We also conducted a detailed ablation study to determine whether it is essential to distil knowledge at various scales, including low-level features from encoder layers, for effective knowledge transfer. In addition, we examine different losses in the knowledge distillation process to gain insights into their effects on overall performance.

Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation

TL;DR

The work tackles efficient medical image segmentation under data and compute constraints by transferring knowledge from a large multi-task teacher to a compact student. It proposes a unified framework combining multi-scale feature distillation with supervised contrastive learning (InfoNCE) and a prediction maps distillation loss to align representations across encoder, bottleneck, and decoder scales. Experiments on spleen CT segmentation demonstrate that encoder-to-encoder distillation with contrastive learning yields the strongest gains, enabling a smaller model trained on half the data to approach or surpass the performance of larger baselines, with statistical significance. The approach offers practical benefits for deploying accurate MIS models in resource-limited clinical settings by reducing data and computation requirements without sacrificing performance.

Abstract

This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks, specifically focusing on the transfer from a larger multi-task "Teacher" network to a smaller "Student" network. In the context of medical imaging, where the data volumes are often limited, leveraging knowledge from a larger pre-trained network could be useful. The primary objective is to enhance the performance of a smaller student model by incorporating knowledge representations acquired by a teacher model that adopts a multi-task pre-trained architecture trained on CT images, to a more resource-efficient student network, which can essentially be a smaller version of the same, trained on a mere 50% of the data than that of the teacher model. To facilitate knowledge transfer between the two models, we devised an architecture incorporating multi-scale feature distillation and supervised contrastive learning. Our study aims to improve the student model's performance by integrating knowledge representations from the teacher model. We investigate whether this approach is particularly effective in scenarios with limited computational resources and limited training data availability. To assess the impact of multi-scale feature distillation, we conducted extensive experiments. We also conducted a detailed ablation study to determine whether it is essential to distil knowledge at various scales, including low-level features from encoder layers, for effective knowledge transfer. In addition, we examine different losses in the knowledge distillation process to gain insights into their effects on overall performance.
Paper Structure (40 sections, 18 equations, 64 figures, 11 tables)

This paper contains 40 sections, 18 equations, 64 figures, 11 tables.

Figures (64)

  • Figure 1: An example of multi-class medical image segmentation chen2021transunet
  • Figure 2: An example of binary medical image segmentation
  • Figure 3: Representation of Contrastive Pairs
  • Figure 4: Teacher-Student Framework for Knowledge Distillation Gou_2021
  • Figure 5: Processed 2D slices and their corresponding ground truth mask.
  • ...and 59 more figures