Deep learning-based brain segmentation model performance validation with clinical radiotherapy CT
Selena Huisman, Matteo Maspero, Marielle Philippens, Joost Verhoeff, Szabolcs David
TL;DR
This study validates a contrast- and resolution-robust brain segmentation model, SynthSeg, on CT data by leveraging 260 paired CT/MRI scans from multiple centers. Using region-wise QC scores, the authors show that CT-based segmentation approaches MRI-based performance in general, achieving a median Dice of $0.76$ and HD_{95} of $2.95$ mm, with QC filtering improving these metrics. They demonstrate that CT preserves key sex- and age-related brain-volume relationships similar to MRI, despite systematic volume differences between modalities. The work suggests CT-based brain segmentation is viable for non-precision-critical neuroanatomical studies and potentially adaptive radiotherapy, provided QC gates are employed and regions with poor quality are excluded.
Abstract
Manual segmentation of medical images is labor intensive and especially challenging for images with poor contrast or resolution. The presence of disease exacerbates this further, increasing the need for an automated solution. To this extent, SynthSeg is a robust deep learning model designed for automatic brain segmentation across various contrasts and resolutions. This study validates the SynthSeg robust brain segmentation model on computed tomography (CT), using a multi-center dataset. An open access dataset of 260 paired CT and magnetic resonance imaging (MRI) from radiotherapy patients treated in 5 centers was collected. Brain segmentations from CT and MRI were obtained with SynthSeg model, a component of the Freesurfer imaging suite. These segmentations were compared and evaluated using Dice scores and Hausdorff 95 distance (HD95), treating MRI-based segmentations as the ground truth. Brain regions that failed to meet performance criteria were excluded based on automated quality control (QC) scores. Dice scores indicate a median overlap of 0.76 (IQR: 0.65-0.83). The median HD95 is 2.95 mm (IQR: 1.73-5.39). QC score based thresholding improves median dice by 0.1 and median HD95 by 0.05mm. Morphological differences related to sex and age, as detected by MRI, were also replicated with CT, with an approximate 17% difference between the CT and MRI results for sex and 10% difference between the results for age. SynthSeg can be utilized for CT-based automatic brain segmentation, but only in applications where precision is not essential. CT performance is lower than MRI based on the integrated QC scores, but low-quality segmentations can be excluded with QC-based thresholding. Additionally, performing CT-based neuroanatomical studies is encouraged, as the results show correlations in sex- and age-based analyses similar to those found with MRI.
