Table of Contents
Fetching ...

Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou

TL;DR

A novel coreset selection strategy termed as Evolution-aware VAriance (EVA), which captures the evolutionary process of model training through a dual-window approach and reflects the fluctuation of sample importance more precisely through variance measurement, showcasing its potential for efficient medical image analysis.

Abstract

In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection methods are primarily designed for natural image datasets, and exhibit doubtful effectiveness when applied to medical image datasets due to challenges such as intra-class variation and inter-class similarity. In this paper, we propose a novel coreset selection strategy termed as Evolution-aware VAriance (EVA), which captures the evolutionary process of model training through a dual-window approach and reflects the fluctuation of sample importance more precisely through variance measurement. Extensive experiments on medical image datasets demonstrate the effectiveness of our strategy over previous SOTA methods, especially at high compression rates. EVA achieves 98.27% accuracy with only 10% training data, compared to 97.20% for the full training set. None of the compared baseline methods can exceed Random at 5% selection rate, while EVA outperforms Random by 5.61%, showcasing its potential for efficient medical image analysis.

Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

TL;DR

A novel coreset selection strategy termed as Evolution-aware VAriance (EVA), which captures the evolutionary process of model training through a dual-window approach and reflects the fluctuation of sample importance more precisely through variance measurement, showcasing its potential for efficient medical image analysis.

Abstract

In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection methods are primarily designed for natural image datasets, and exhibit doubtful effectiveness when applied to medical image datasets due to challenges such as intra-class variation and inter-class similarity. In this paper, we propose a novel coreset selection strategy termed as Evolution-aware VAriance (EVA), which captures the evolutionary process of model training through a dual-window approach and reflects the fluctuation of sample importance more precisely through variance measurement. Extensive experiments on medical image datasets demonstrate the effectiveness of our strategy over previous SOTA methods, especially at high compression rates. EVA achieves 98.27% accuracy with only 10% training data, compared to 97.20% for the full training set. None of the compared baseline methods can exceed Random at 5% selection rate, while EVA outperforms Random by 5.61%, showcasing its potential for efficient medical image analysis.
Paper Structure (22 sections, 6 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 22 sections, 6 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Existing single-timeframe/window snapshots methods fail to capture sample importance fluctuations across epochs. Different samples are denoted in different colors. Here, we measure importance score using the error vector score, a snapshot-based criterion defined in paul2021deep, considering only the first 10 epochs as indicated by the dashed box. These scores are obtained by training ResNet-18 on OrganAMNIST.
  • Figure 2: The pipeline of our proposed EVA . First, we record individual predicted probabilities ${f}_{t}^{(i)}=f_{\bm{\theta_t}}( \bm{x}_{i} )$ of samples during training. Then, we measure a score $\mathrm{\mathcal{S}}_{t}^{\left( i \right)}$ for each sample, i.e. the L2 norm of error vector. Next, the variance of scores within a window of epochs are calculated to reflect the fluctuation of each sample's contribution. Samples that fluctuate the most are considered important in this stage. Finally, we identify samples that exhibit high importance in dual-window.
  • Figure 3: Ablation study on the summary statistics. We validated the effectiveness of variance measurement under single-window and dual-window settings on OrganAMNIST (a)(b) and OrganSMNIST (c)(d). In (a) and (c), we contrast the Exp-S and Var-S strategies within an early 10-epoch window. (b) and (d) explore the Exp-D and Var-D strategies in dual-window setting.
  • Figure 4: Ablation study on the window setting. The results are obtained on OrganAMNIST (top row) and OrganSMNIST (bottom row). Performance of the Var-S versus Var-D strategies is illustrated in (a) and (c), while (b) and (d) show comparisons between Exp-S and Exp-D strategies.
  • Figure 5: Comparison of different window combinations. These windows represent different training phases. (a)-(d) show experimental results for OrganSMNIST, and (e)-(h) for OrganAMNIST, with each line depicting a unique window combination (single or dual windows).
  • ...and 3 more figures