Table of Contents
Fetching ...

Segmentation variability and radiomics stability for predicting Triple-Negative Breast Cancer subtype using Magnetic Resonance Imaging

Isabella Cama, Alejandro Guzmán, Cristina Campi, Michele Piana, Karim Lekadir, Sara Garbarino, Oliver Díaz

TL;DR

This study evaluates how segmentation variability affects radiomic stability and the prediction of Triple-Negative Breast Cancer (TNBC) using MRI. By perturbing manual tumor segmentations and applying SHAP-based feature selection across multiple segmentation masks, the authors show that predictive performance can remain stable even when segmentation accuracy varies, and that reliance on ICC alone may discard valuable features. The findings suggest that carefully selected, explainable radiomic features can rival biopsy-derived information for TNBC classification, while highlighting the limited utility of stability metrics as sole feature-selection criteria. Overall, the work contributes to understanding the interaction between segmentation variability, feature stability, and predictive power, with implications for more robust, generalizable radiomics approaches in breast cancer.

Abstract

Most papers caution against using predictive models for disease stratification based on unselected radiomic features, as these features are affected by contouring variability. Instead, they advocate for the use of the Intraclass Correlation Coefficient (ICC) as a measure of stability for feature selection. However, the direct effect of segmentation variability on the predictive models is rarely studied. This study investigates the impact of segmentation variability on feature stability and predictive performance in radiomics-based prediction of Triple-Negative Breast Cancer (TNBC) subtype using Magnetic Resonance Imaging. A total of 244 images from the Duke dataset were used, with segmentation variability introduced through modifications of manual segmentations. For each mask, explainable radiomic features were selected using the Shapley Additive exPlanations method and used to train logistic regression models. Feature stability across segmentations was assessed via ICC, Pearson's correlation, and reliability scores quantifying the relationship between feature stability and segmentation variability. Results indicate that segmentation accuracy does not significantly impact predictive performance. While incorporating peritumoral information may reduce feature reproducibility, it does not diminish feature predictive capability. Moreover, feature selection in predictive models is not inherently tied to feature stability with respect to segmentation, suggesting that an overreliance on ICC or reliability scores for feature selection might exclude valuable predictive features.

Segmentation variability and radiomics stability for predicting Triple-Negative Breast Cancer subtype using Magnetic Resonance Imaging

TL;DR

This study evaluates how segmentation variability affects radiomic stability and the prediction of Triple-Negative Breast Cancer (TNBC) using MRI. By perturbing manual tumor segmentations and applying SHAP-based feature selection across multiple segmentation masks, the authors show that predictive performance can remain stable even when segmentation accuracy varies, and that reliance on ICC alone may discard valuable features. The findings suggest that carefully selected, explainable radiomic features can rival biopsy-derived information for TNBC classification, while highlighting the limited utility of stability metrics as sole feature-selection criteria. Overall, the work contributes to understanding the interaction between segmentation variability, feature stability, and predictive power, with implications for more robust, generalizable radiomics approaches in breast cancer.

Abstract

Most papers caution against using predictive models for disease stratification based on unselected radiomic features, as these features are affected by contouring variability. Instead, they advocate for the use of the Intraclass Correlation Coefficient (ICC) as a measure of stability for feature selection. However, the direct effect of segmentation variability on the predictive models is rarely studied. This study investigates the impact of segmentation variability on feature stability and predictive performance in radiomics-based prediction of Triple-Negative Breast Cancer (TNBC) subtype using Magnetic Resonance Imaging. A total of 244 images from the Duke dataset were used, with segmentation variability introduced through modifications of manual segmentations. For each mask, explainable radiomic features were selected using the Shapley Additive exPlanations method and used to train logistic regression models. Feature stability across segmentations was assessed via ICC, Pearson's correlation, and reliability scores quantifying the relationship between feature stability and segmentation variability. Results indicate that segmentation accuracy does not significantly impact predictive performance. While incorporating peritumoral information may reduce feature reproducibility, it does not diminish feature predictive capability. Moreover, feature selection in predictive models is not inherently tied to feature stability with respect to segmentation, suggesting that an overreliance on ICC or reliability scores for feature selection might exclude valuable predictive features.

Paper Structure

This paper contains 11 sections, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Left to right: MR slices of three different patients (first-post-contrast image). Top to bottom: original image, manual tumor segmentation (red), closing 08 mask (orange), closing 07 (green), closing 06 (blue), and ellipsoid 04 (magenta). For the definition of 'closing mask' see Section \ref{['sec:segmentation']}.
  • Figure 2: Flowchart of the feature selection methodology employed for this study, based on SHAP explainability algorithm.
  • Figure 3: Top row, from left to right: example slice of original DCE-MR image, Wavelet-HHH filtered image, Wavelet-HLH filtered image, and LoG $\sigma=3$ filtered image. Bottom row, from left to right: zoomed view of the manual segmentation mask on the original and filtered images. Tumor segmentation is shown in red.
  • Figure 4: ROC-AUC scores obtained by testing demographical model (boxplot $1$), biopsy model (boxplot $2$), baseline models (boxplots $3$-$7$), and best-SHAP models (boxplots $8$-$12$).
  • Figure 5: Top panel: ICC of the four common best-SHAP features at varying segmentation mask; the dashed lines indicate median ICC on all features, summarizing the reproducibility between features extracted from the manual mask and each of its modifications. Bottom panel: Pearson's correlation of the four common features at varying mask; the dashed lines indicate median correlation on all features, summarizing the correlation between features extracted from the manual mask and each of its modifications.
  • ...and 2 more figures