Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations
Navodini Wijethilake, Marina Ivory, Oscar MacCormac, Siddhant Kumar, Aaron Kujawa, Lorena Garcia-Foncillas Macias, Rebecca Burger, Amanda Hitchings, Suki Thomson, Sinan Barazi, Eleni Maratos, Rupert Obholzer, Dan Jiang, Fiona McClenaghan, Kazumi Chia, Omar Al-Salihi, Nick Thomas, Steve Connor, Tom Vercauteren, Jonathan Shapey
TL;DR
This work addresses the challenge of producing trustworthy, large-scale VS annotations across heterogeneous MRI data for robust segmentation. It introduces a consensus-based human-in-the-loop annotation pipeline that bootstraps DL-based segmentation by integrating external gamma knife datasets (LDN SC-GK, ETZ SC-GK) with the UK MC-RC cohort, followed by expert validation and refinement. Across three bootstrapping rounds, internal validation DSC improved from 0.9125 to 0.9670 while external performance remained stable, and the approach reduced manual annotation time by about 37.4%. The resulting UK MC-RC-2 dataset (190 patients, 534 annotated T1CE scans) is publicly available on TCIA and demonstrates the method's potential for clinically adaptable, generalisable VS segmentation in diverse clinical settings.
Abstract
Accurate segmentation of vestibular schwannoma (VS) on Magnetic Resonance Imaging (MRI) is essential for patient management but often requires time-intensive manual annotations by experts. While recent advances in deep learning (DL) have facilitated automated segmentation, challenges remain in achieving robust performance across diverse datasets and complex clinical cases. We present an annotated dataset stemming from a bootstrapped DL-based framework for iterative segmentation and quality refinement of VS in MRI. We combine data from multiple centres and rely on expert consensus for trustworthiness of the annotations. We show that our approach enables effective and resource-efficient generalisation of automated segmentation models to a target data distribution. The framework achieved a significant improvement in segmentation accuracy with a Dice Similarity Coefficient (DSC) increase from 0.9125 to 0.9670 on our target internal validation dataset, while maintaining stable performance on representative external datasets. Expert evaluation on 143 scans further highlighted areas for model refinement, revealing nuanced cases where segmentation required expert intervention. The proposed approach is estimated to enhance efficiency by approximately 37.4% compared to the conventional manual annotation process. Overall, our human-in-the-loop model training approach achieved high segmentation accuracy, highlighting its potential as a clinically adaptable and generalisable strategy for automated VS segmentation in diverse clinical settings. The dataset includes 190 patients, with tumour annotations available for 534 longitudinal contrast-enhanced T1-weighted (T1CE) scans from 184 patients, and non-annotated T2-weighted scans from 6 patients. This dataset is publicly accessible on The Cancer Imaging Archive (TCIA) (https://doi.org/10.7937/bq0z-xa62).
