Table of Contents
Fetching ...

A unified FLAIR hyperintensity segmentation model for various CNS tumor types and acquisition time points

Mathilde Gajda Faanes, David Bouget, Asgeir S. Jakola, Timothy R. Smith, Vasileios K. Kavouridis, Francesco Latini, Margret Jensdottir, Peter Milos, Henrietta Nittby Redebrandt, Rickard L. Sjöberg, Rupavathana Mahesparan, Lars Kjelsberg Pedersen, Ole Solheim, Ingerid Reinertsen

TL;DR

The study addresses automatic segmentation of FLAIR hyperintensity across diverse CNS tumor types and acquisition times by training a unified Attention U‑Net model on a large, multi-center dataset. It demonstrates that the unified model generalizes across meningiomas, metastases, and gliomas, as well as pre- and post-operative scans, achieving Dice scores comparable to tumor-type specific models and performing competitively with BraTS benchmarks using only FLAIR inputs. The work also analyzes detection and clinically oriented volume metrics, explores SNFH via tumor subtraction, and integrates the model into Raidionics for open-source clinical use. While promising, it notes limitations from ground-truth annotation variability and small-volume segmentation, underscoring the need for standardized labeling and further clinical validation to enable robust deployment.

Abstract

T2-weighted fluid-attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) scans are important for diagnosis, treatment planning and monitoring of brain tumors. Depending on the brain tumor type, the FLAIR hyperintensity volume is an important measure to asses the tumor volume or surrounding edema, and an automatic segmentation of this would be useful in the clinic. In this study, around 5000 FLAIR images of various tumors types and acquisition time points from different centers were used to train a unified FLAIR hyperintensity segmentation model using an Attention U-Net architecture. The performance was compared against dataset specific models, and was validated on different tumor types, acquisition time points and against BraTS. The unified model achieved an average Dice score of 88.65\% for pre-operative meningiomas, 80.08% for pre-operative metastasis, 90.92% for pre-operative and 84.60% for post-operative gliomas from BraTS, and 84.47% for pre-operative and 61.27\% for post-operative lower grade gliomas. In addition, the results showed that the unified model achieved comparable segmentation performance to the dataset specific models on their respective datasets, and enables generalization across tumor types and acquisition time points, which facilitates the deployment in a clinical setting. The model is integrated into Raidionics, an open-source software for CNS tumor analysis.

A unified FLAIR hyperintensity segmentation model for various CNS tumor types and acquisition time points

TL;DR

The study addresses automatic segmentation of FLAIR hyperintensity across diverse CNS tumor types and acquisition times by training a unified Attention U‑Net model on a large, multi-center dataset. It demonstrates that the unified model generalizes across meningiomas, metastases, and gliomas, as well as pre- and post-operative scans, achieving Dice scores comparable to tumor-type specific models and performing competitively with BraTS benchmarks using only FLAIR inputs. The work also analyzes detection and clinically oriented volume metrics, explores SNFH via tumor subtraction, and integrates the model into Raidionics for open-source clinical use. While promising, it notes limitations from ground-truth annotation variability and small-volume segmentation, underscoring the need for standardized labeling and further clinical validation to enable robust deployment.

Abstract

T2-weighted fluid-attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) scans are important for diagnosis, treatment planning and monitoring of brain tumors. Depending on the brain tumor type, the FLAIR hyperintensity volume is an important measure to asses the tumor volume or surrounding edema, and an automatic segmentation of this would be useful in the clinic. In this study, around 5000 FLAIR images of various tumors types and acquisition time points from different centers were used to train a unified FLAIR hyperintensity segmentation model using an Attention U-Net architecture. The performance was compared against dataset specific models, and was validated on different tumor types, acquisition time points and against BraTS. The unified model achieved an average Dice score of 88.65\% for pre-operative meningiomas, 80.08% for pre-operative metastasis, 90.92% for pre-operative and 84.60% for post-operative gliomas from BraTS, and 84.47% for pre-operative and 61.27\% for post-operative lower grade gliomas. In addition, the results showed that the unified model achieved comparable segmentation performance to the dataset specific models on their respective datasets, and enables generalization across tumor types and acquisition time points, which facilitates the deployment in a clinical setting. The model is integrated into Raidionics, an open-source software for CNS tumor analysis.

Paper Structure

This paper contains 3 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of the data used in this project showing the data origin with the number of images in parenthesis, the annotation types, distribution into different subgroups for model training, and average FLAIR hyperintensity volumes with standard deviation. Group A represents the private LGG dataset of tumors classified as supratentorial diffuse WHO grade II glioma according to the 2007 or 2016 WHO classification system louis_2007_2007louis_2016_2016, with tumor annoations used as label for the FLAIR hyperintensity (FH) volume. Group B represents the BraTS data with annotations of the tumor and surrounding non-enhancing FLAIR hyperintensity (SNFH) volume, merged to produce the total FLAIR hyperintensity volume (FH). Group A has red-based colors, whereas group B has blue-based colors. Light colors represent pre-operative cases and dark colors represent post-operative cases.
  • Figure 2: Illustration of the experiment preparation and the pipeline for training and evaluating segmentation models. An overview of the fold splitting, where the data were evenly split with respect to tumor type, source and size (represented by colors), and of how different subgroups of the data were used as input for training and evaluation, is presented. Note that each model was evaluated on each subgroup it was trained on, and that since the fold splits were kept fixed, the test folds are the same for each subgroup for all models.
  • Figure 3: Scatterplot showing the object-wice Dice scores for the unified model (A_B_pre_post) on different test sets grouped by brain tumor type pre- and post-operatively along the Y-axis and different volumes with a logarithmic scale along the X-axis. All test cases are shown, including false negative samples.
  • Figure 4: Examples of predictions in pink and the ground truth in green. The top row and the second row shows cases with high and low Dice scores, respectively. From left to rights samples from: Men_B_pre, Met_B_pre, Gli_B_pre, Gli_B_post, Gli_A_pre and Gli_A_post. The last row shows examples of Gli_A_post with low scores.