Table of Contents
Fetching ...

Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification

Ayaka Tsutsumi, Guang Li, Ren Togo, Takahiro Ogawa, Satoshi Kondo, Miki Haseyama

TL;DR

This paper tackles the challenge of deploying accurate medical image classifiers under tight computational constraints. It introduces a dual-model weight selection strategy that initializes two lightweight models from a large pretrained teacher, combined with self-knowledge distillation using an EMA-based auxiliary teacher to refine learning without extra cost. Across chest X-ray, lung CT, and brain MRI datasets, the approach yields consistent accuracy gains, particularly in data-scarce scenarios, while maintaining efficiency. The work offers a practical pathway to robust, resource-efficient medical imaging systems suitable for real-world clinical deployment.

Abstract

We propose a novel medical image classification method that integrates dual-model weight selection with self-knowledge distillation (SKD). In real-world medical settings, deploying large-scale models is often limited by computational resource constraints, which pose significant challenges for their practical implementation. Thus, developing lightweight models that achieve comparable performance to large-scale models while maintaining computational efficiency is crucial. To address this, we employ a dual-model weight selection strategy that initializes two lightweight models with weights derived from a large pretrained model, enabling effective knowledge transfer. Next, SKD is applied to these selected models, allowing the use of a broad range of initial weight configurations without imposing additional excessive computational cost, followed by fine-tuning for the target classification tasks. By combining dual-model weight selection with self-knowledge distillation, our method overcomes the limitations of conventional approaches, which often fail to retain critical information in compact models. Extensive experiments on publicly available datasets-chest X-ray images, lung computed tomography scans, and brain magnetic resonance imaging scans-demonstrate the superior performance and robustness of our approach compared to existing methods.

Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification

TL;DR

This paper tackles the challenge of deploying accurate medical image classifiers under tight computational constraints. It introduces a dual-model weight selection strategy that initializes two lightweight models from a large pretrained teacher, combined with self-knowledge distillation using an EMA-based auxiliary teacher to refine learning without extra cost. Across chest X-ray, lung CT, and brain MRI datasets, the approach yields consistent accuracy gains, particularly in data-scarce scenarios, while maintaining efficiency. The work offers a practical pathway to robust, resource-efficient medical imaging systems suitable for real-world clinical deployment.

Abstract

We propose a novel medical image classification method that integrates dual-model weight selection with self-knowledge distillation (SKD). In real-world medical settings, deploying large-scale models is often limited by computational resource constraints, which pose significant challenges for their practical implementation. Thus, developing lightweight models that achieve comparable performance to large-scale models while maintaining computational efficiency is crucial. To address this, we employ a dual-model weight selection strategy that initializes two lightweight models with weights derived from a large pretrained model, enabling effective knowledge transfer. Next, SKD is applied to these selected models, allowing the use of a broad range of initial weight configurations without imposing additional excessive computational cost, followed by fine-tuning for the target classification tasks. By combining dual-model weight selection with self-knowledge distillation, our method overcomes the limitations of conventional approaches, which often fail to retain critical information in compact models. Extensive experiments on publicly available datasets-chest X-ray images, lung computed tomography scans, and brain magnetic resonance imaging scans-demonstrate the superior performance and robustness of our approach compared to existing methods.

Paper Structure

This paper contains 18 sections, 7 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Overview of the proposed dual-model weight selection method. The teacher model $T$ is used to initialize the main student model $S$ and the auxiliary student model $S'$.
  • Figure 2: Overview of the proposed self-knowledge distillation method. The weights of the auxiliary student model $S'$ are the exponential moving average (EMA) of the weights of the main student model $S$, and SG denotes the stop gradient operation.
  • Figure 3:
  • Figure 4:
  • Figure 6:
  • ...and 7 more figures