Table of Contents
Fetching ...

Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations

Navodini Wijethilake, Marina Ivory, Oscar MacCormac, Siddhant Kumar, Aaron Kujawa, Lorena Garcia-Foncillas Macias, Rebecca Burger, Amanda Hitchings, Suki Thomson, Sinan Barazi, Eleni Maratos, Rupert Obholzer, Dan Jiang, Fiona McClenaghan, Kazumi Chia, Omar Al-Salihi, Nick Thomas, Steve Connor, Tom Vercauteren, Jonathan Shapey

TL;DR

This work addresses the challenge of producing trustworthy, large-scale VS annotations across heterogeneous MRI data for robust segmentation. It introduces a consensus-based human-in-the-loop annotation pipeline that bootstraps DL-based segmentation by integrating external gamma knife datasets (LDN SC-GK, ETZ SC-GK) with the UK MC-RC cohort, followed by expert validation and refinement. Across three bootstrapping rounds, internal validation DSC improved from 0.9125 to 0.9670 while external performance remained stable, and the approach reduced manual annotation time by about 37.4%. The resulting UK MC-RC-2 dataset (190 patients, 534 annotated T1CE scans) is publicly available on TCIA and demonstrates the method's potential for clinically adaptable, generalisable VS segmentation in diverse clinical settings.

Abstract

Accurate segmentation of vestibular schwannoma (VS) on Magnetic Resonance Imaging (MRI) is essential for patient management but often requires time-intensive manual annotations by experts. While recent advances in deep learning (DL) have facilitated automated segmentation, challenges remain in achieving robust performance across diverse datasets and complex clinical cases. We present an annotated dataset stemming from a bootstrapped DL-based framework for iterative segmentation and quality refinement of VS in MRI. We combine data from multiple centres and rely on expert consensus for trustworthiness of the annotations. We show that our approach enables effective and resource-efficient generalisation of automated segmentation models to a target data distribution. The framework achieved a significant improvement in segmentation accuracy with a Dice Similarity Coefficient (DSC) increase from 0.9125 to 0.9670 on our target internal validation dataset, while maintaining stable performance on representative external datasets. Expert evaluation on 143 scans further highlighted areas for model refinement, revealing nuanced cases where segmentation required expert intervention. The proposed approach is estimated to enhance efficiency by approximately 37.4% compared to the conventional manual annotation process. Overall, our human-in-the-loop model training approach achieved high segmentation accuracy, highlighting its potential as a clinically adaptable and generalisable strategy for automated VS segmentation in diverse clinical settings. The dataset includes 190 patients, with tumour annotations available for 534 longitudinal contrast-enhanced T1-weighted (T1CE) scans from 184 patients, and non-annotated T2-weighted scans from 6 patients. This dataset is publicly accessible on The Cancer Imaging Archive (TCIA) (https://doi.org/10.7937/bq0z-xa62).

Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations

TL;DR

This work addresses the challenge of producing trustworthy, large-scale VS annotations across heterogeneous MRI data for robust segmentation. It introduces a consensus-based human-in-the-loop annotation pipeline that bootstraps DL-based segmentation by integrating external gamma knife datasets (LDN SC-GK, ETZ SC-GK) with the UK MC-RC cohort, followed by expert validation and refinement. Across three bootstrapping rounds, internal validation DSC improved from 0.9125 to 0.9670 while external performance remained stable, and the approach reduced manual annotation time by about 37.4%. The resulting UK MC-RC-2 dataset (190 patients, 534 annotated T1CE scans) is publicly available on TCIA and demonstrates the method's potential for clinically adaptable, generalisable VS segmentation in diverse clinical settings.

Abstract

Accurate segmentation of vestibular schwannoma (VS) on Magnetic Resonance Imaging (MRI) is essential for patient management but often requires time-intensive manual annotations by experts. While recent advances in deep learning (DL) have facilitated automated segmentation, challenges remain in achieving robust performance across diverse datasets and complex clinical cases. We present an annotated dataset stemming from a bootstrapped DL-based framework for iterative segmentation and quality refinement of VS in MRI. We combine data from multiple centres and rely on expert consensus for trustworthiness of the annotations. We show that our approach enables effective and resource-efficient generalisation of automated segmentation models to a target data distribution. The framework achieved a significant improvement in segmentation accuracy with a Dice Similarity Coefficient (DSC) increase from 0.9125 to 0.9670 on our target internal validation dataset, while maintaining stable performance on representative external datasets. Expert evaluation on 143 scans further highlighted areas for model refinement, revealing nuanced cases where segmentation required expert intervention. The proposed approach is estimated to enhance efficiency by approximately 37.4% compared to the conventional manual annotation process. Overall, our human-in-the-loop model training approach achieved high segmentation accuracy, highlighting its potential as a clinically adaptable and generalisable strategy for automated VS segmentation in diverse clinical settings. The dataset includes 190 patients, with tumour annotations available for 534 longitudinal contrast-enhanced T1-weighted (T1CE) scans from 184 patients, and non-annotated T2-weighted scans from 6 patients. This dataset is publicly accessible on The Cancer Imaging Archive (TCIA) (https://doi.org/10.7937/bq0z-xa62).

Paper Structure

This paper contains 5 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Pipeline for data curation, DL-based collaborative iterative generation of quality-controlled VS segmentations.
  • Figure 2: Overall patient flow from the original uncurated UK MC-RC-2 dataset to curated UK MC-RC-2 dataset.
  • Figure 3: Comparison of internal UK multi-centre routine clinical-2 (UK MC-RC-2) annotating dataset and external single-centre Gamma Knife (LDN SC-GK & ETZ SC-GK) and the UK MC-RC datasets. Distributions of, A) slice thickness, B) image resolution in terms of voxel volume, and C) number of slices in each image. The standardised acquisition protocol of the SC-GK dataset results in homogeneous distributions across scans, while the heterogeneity in the MC-RC datasets reflects varied acquisition settings and protocols, leading to differences in slice thickness, resolution, and slice counts.
  • Figure 4: Session flow from the data annotation pipeline.
  • Figure 5: (A) DSC distribution on the external test set and the internal UK MC-RC-2 validation set. A statistically significant improvement (p < 0.05) is observed between Rounds 1 and 2 and between Rounds 1 and 3. (B) Three sample segmentations from the internal UK MC-RC-2 validation set: (1) and (2) show improved DSC, while (3) demonstrates performance loss, the segmentation missing the large peritumoural cystic region.
  • ...and 2 more figures