From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data

Yitong Li; Tom Nuno Wolf; Sebastian Pölsterl; Igor Yakushev; Dennis M. Hedderich; Christian Wachinger

From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data

Yitong Li, Tom Nuno Wolf, Sebastian Pölsterl, Igor Yakushev, Dennis M. Hedderich, Christian Wachinger

TL;DR

The paper addresses the challenge of differentiating dementia types using MRI when target data are scarce. It introduces Triplet Training, a three-stage framework that first leverages large unlabeled MRI data with Barlow Twins SSL, followed by self-distillation on a task-related dataset, and culminates in fine-tuning on a small in-house target set. The approach achieves a balanced accuracy of about 75.6% on the target task and demonstrates robustness and generalization via latent-space analyses and comprehensive ablations. This work demonstrates effective use of unlabeled data and knowledge distillation to improve differential diagnosis in data-limited clinical imaging, and provides code for replication.

Abstract

Differential diagnosis of dementia is challenging due to overlapping symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house datasets that are insufficient for training deep neural networks (DNNs). Self-supervised learning shows promise for utilizing unlabeled MRI scans in training, but small batch sizes for volumetric brain scans make its application challenging. To address these issues, we propose Triplet Training for differential diagnosis with limited target data. It consists of three key stages: (i) self-supervised pre-training on unlabeled data with Barlow Twins, (ii) self-distillation on task-related data, and (iii) fine-tuning on the target dataset. Our approach significantly outperforms traditional training strategies, achieving a balanced accuracy of 75.6%. We further provide insights into the training process by visualizing changes in the latent space after each step. Finally, we validate the robustness of Triplet Training in terms of its individual components in a comprehensive ablation study. Our code is available at https://github.com/ai-med/TripletTraining.

From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 6 figures, 5 tables)

This paper contains 17 sections, 3 equations, 6 figures, 5 tables.

Introduction
Related Work
Differential Diagnosis of AD and FTD with DNNs.
Self-Supervised Learning and Self-Distillation in Medical Image Analysis.
Methods
Preliminaries and Datasets
Triplet Training
Experiments
Results
Visualization of the latent space.
Ablation Study 1: Hyper-parameters.
Ablation Study 2: Benchmark Self-Supervised Approaches.
Conclusion
Architecture
Training Details
...and 2 more sections

Figures (6)

Figure 1: Triplet Training for differential diagnosis of dementia: 1) task un-related data is invoked with self-supervision, 2) self-distillation on task-related data, 3) the network is fine-tuned on the training part of the target dataset and evaluated on the test part.
Figure 2: Overview of the three stages of Triplet Training.
Figure 3: Changes in latent space of all datasets (first row) and the step-wise target dataset (second row) after each step in Triplet Training with UMAP. $\mathcal{U}$: No label (purple, representative fraction of samples to improve readability); Task-related $\mathcal{D}$: CN (dark blue), AD (red), FTD (dark grey); In-house $\mathcal{T}$: CN (light blue), AD (orange), FTD (light grey).
Figure 4: Ablation studies of hyper-parameters in Triplet Training.
Figure 5: We select a 3D ResNet as the feature extractor $f$ for all models. It consists of six residual blocks, each consisting of two convolutional layers followed by batch normalization and ReLU non-linearity. The five last residual blocks each start with a convolutional layer with stride two.
...and 1 more figures

From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data

TL;DR

Abstract

From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data

Authors

TL;DR

Abstract

Table of Contents

Figures (6)