BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision

Prantik Deb; Lalith Bharadwaj Baru; Kamalaker Dadi; Bapi Raju S

BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision

Prantik Deb, Lalith Bharadwaj Baru, Kamalaker Dadi, Bapi Raju S

TL;DR

This work benchmarks end-to-end U-Net–style models for stroke lesion segmentation on ATLAS v2.0 across 2D and 3D T1-weighted MRI. It evaluates five 2D architectures (U-Net, Residual U-Net, Attention U-Net, TransAttn U-Net, and U-Net Transformer) and three 3D variants, reporting the top Dice scores of $0.583$ for the 2D Transformer-based model and $0.504$ for the 3D Residual U-Net. A Wilcoxon Signed Rank Test on predicted versus true lesion volumes shows significant agreement for the 3D U-Net ($p\approx 0.0$, $\rho=0.949$) and 3D Residual U-Net ($p=0.001$, $\rho=0.962$), but not for Attention U-Net ($p=0.540$, $\rho=0.844$). The authors provide reproducible code at GitHub and discuss limitations such as the absence of data augmentation and alternative supervision strategies, recommending future work in augmentation, multi-modality data, and cascaded attention to advance stroke lesion segmentation.

Abstract

Brain stroke has become a significant burden on global health and thus we need remedies and prevention strategies to overcome this challenge. For this, the immediate identification of stroke and risk stratification is the primary task for clinicians. To aid expert clinicians, automated segmentation models are crucial. In this work, we consider the publicly available dataset ATLAS $v2.0$ to benchmark various end-to-end supervised U-Net style models. Specifically, we have benchmarked models on both 2D and 3D brain images and evaluated them using standard metrics. We have achieved the highest Dice score of 0.583 on the 2D transformer-based model and 0.504 on the 3D residual U-Net respectively. We have conducted the Wilcoxon test for 3D models to correlate the relationship between predicted and actual stroke volume. For reproducibility, the code and model weights are made publicly available: https://github.com/prantik-pdeb/BeSt-LeS.

BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision

TL;DR

for the 2D Transformer-based model and

for the 3D Residual U-Net. A Wilcoxon Signed Rank Test on predicted versus true lesion volumes shows significant agreement for the 3D U-Net (

) and 3D Residual U-Net (

), but not for Attention U-Net (

). The authors provide reproducible code at GitHub and discuss limitations such as the absence of data augmentation and alternative supervision strategies, recommending future work in augmentation, multi-modality data, and cascaded attention to advance stroke lesion segmentation.

Abstract

to benchmark various end-to-end supervised U-Net style models. Specifically, we have benchmarked models on both 2D and 3D brain images and evaluated them using standard metrics. We have achieved the highest Dice score of 0.583 on the 2D transformer-based model and 0.504 on the 3D residual U-Net respectively. We have conducted the Wilcoxon test for 3D models to correlate the relationship between predicted and actual stroke volume. For reproducibility, the code and model weights are made publicly available: https://github.com/prantik-pdeb/BeSt-LeS.

Paper Structure (28 sections, 10 equations, 4 figures, 5 tables)

This paper contains 28 sections, 10 equations, 4 figures, 5 tables.

Introduction
Contributions of this work
Data and Models
Dataset
U-Net Style Architectures
U-Net:
Residual U-Net:
Attention U-Net:
TransAttn U-Net:
U-Net Transformer:
Results and Discussion
Results for 2D
Results for 3D
Wilcoxon Signed Rank Test:
Limitations and Future Directions
...and 13 more sections

Figures (4)

Figure 1: The figure illustrates various U-Net style architectures. (First row) shows a diagrammatic view of the convolution-based Transformer models and (bottom row) shows two novel transformer-based U-Net architectures. We detailed all the symbols and signs used in the legend block.
Figure 2: 2D visualizations of the benchmarks between ground truth and predicted lesions for two subjects. As can be seen, we have two different sizes of stroke lesion subjects included for visualization (Subject ID: $sub-r027s050$ slice 91 and $sub-r004s004$ slice 68) of which one is small and the other being large. We display the predicted outputs of convolution and transformer-based 2D U-Net models for these subjects. All 2D models performed equally better but the U-Net transformer gave the finest boundaries as visible for both the subjects.
Figure 3: The above visualization was considered from the test set (ID: $sub-r001s010$ and $sub-r011s020$ ) and compares three 3D U-Net models (standard, Residual, and Attention). For each model, the left part remains as ground truth and the right part is the model's predictions. In each image, the visualization elucidates the precise location of stroke in the brain using three axes (sagittal, coronal, and axial).
Figure 4: The Wilcoxon test is carried out for all three 3D U-Net style architectures. The first row indicates a scatter plot and the second one indicates box plots of the predicted and actual stroke volume of the 3D models respectively. Specifically for the scatter plot. The ideal scenario must be the gray dots aligned with the red line.

BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision

TL;DR

Abstract

BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision

Authors

TL;DR

Abstract

Table of Contents

Figures (4)