Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

Alexander Jaus; Simon Reiß; Jens Kleesiek; Rainer Stiefelhagen

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

Alexander Jaus, Simon Reiß, Jens Kleesiek, Rainer Stiefelhagen

TL;DR

This work finds that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs.

Abstract

In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

TL;DR

Abstract

Paper Structure (12 sections, 4 figures, 2 tables)

This paper contains 12 sections, 4 figures, 2 tables.

Introduction and Motivation
Method
Baseline Segmentation Model
Analyzing the Baseline
Analyzing the Dataset
Proposed Method
Results
Discussion and Limitation
Acknowledgements
Excluded Patients in $5^{\text{th}}$ percentile
Excluded Patients in $3^{\text{rd}}$ percentile
Comparison of False Negative Distribution Pre- vs. Post Data Diet

Figures (4)

Figure 1: Analysis of the baseline model on the entire autoPET training dataset
Figure 2: Comparison of the number of samples for each tracer within the autoPet training dataset and the morbidity ratio for each tracer.
Figure 3: QQ-Plot of the pre vs. post data diet False Positive volumes distribution. The dashed red line is the reference indicating an equal distribution.
Figure 4: QQ-Plot of the pre-vs. post diet False-Negative Volume distribution. The dashed red line is the reference indicating an equal distribution.

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

TL;DR

Abstract

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

Authors

TL;DR

Abstract

Table of Contents

Figures (4)