Table of Contents
Fetching ...

Statistical modeling of breast cancer radiomic features and hazard using image registration-aided longitudinal CT data

Subrata Mukherjee, Qian Cao, Thibaud Coroller, Ravi K. Samala, Nicholas Petrick, Berkman Sahiner

Abstract

Patients with metastatic breast cancer (mBC) undergo repeated computed tomography (CT) imaging during treatment to monitor disease progression. Accurate longitudinal tracking of individual lesions across scans from multiple radiologists is essential for reliable radiomic analysis and clinical decision-making. We conducted a retrospective study using serial chest CT scans from the Phase III MONALEESA-3 and MONALEESA-7 trials and developed statistical models for multi-source data integration and survival analysis. First, we introduced a Registration-based Automated Matching and Correspondence (RAMAC) algorithm to establish lesion correspondence across annotations from different radiologists and imaging time points using the Hungarian algorithm. Second, using the RAMAC-processed dataset, we developed interpretable radiomic survival models for progression-free survival prediction by combining baseline radiomic features, post-treatment changes at Weeks 8, 16, and 24, and demographic variables. To address the high dimensionality of longitudinal radiomic data, feature reduction was performed using an L1-penalized additive Cox proportional hazards model and best subset selection followed by Cox modeling. Model performance was evaluated using the concordance index (C-index). Incorporating additional imaging time points improved predictive performance, increasing the mean C-index from 0.58 at baseline to 0.64. Joint modeling further showed significant associations between longitudinal radiomic features and survival outcomes over time.

Statistical modeling of breast cancer radiomic features and hazard using image registration-aided longitudinal CT data

Abstract

Patients with metastatic breast cancer (mBC) undergo repeated computed tomography (CT) imaging during treatment to monitor disease progression. Accurate longitudinal tracking of individual lesions across scans from multiple radiologists is essential for reliable radiomic analysis and clinical decision-making. We conducted a retrospective study using serial chest CT scans from the Phase III MONALEESA-3 and MONALEESA-7 trials and developed statistical models for multi-source data integration and survival analysis. First, we introduced a Registration-based Automated Matching and Correspondence (RAMAC) algorithm to establish lesion correspondence across annotations from different radiologists and imaging time points using the Hungarian algorithm. Second, using the RAMAC-processed dataset, we developed interpretable radiomic survival models for progression-free survival prediction by combining baseline radiomic features, post-treatment changes at Weeks 8, 16, and 24, and demographic variables. To address the high dimensionality of longitudinal radiomic data, feature reduction was performed using an L1-penalized additive Cox proportional hazards model and best subset selection followed by Cox modeling. Model performance was evaluated using the concordance index (C-index). Incorporating additional imaging time points improved predictive performance, increasing the mean C-index from 0.58 at baseline to 0.64. Joint modeling further showed significant associations between longitudinal radiomic features and survival outcomes over time.

Paper Structure

This paper contains 11 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of this study
  • Figure 2: Visualization of RAMAC algorithm in identifying and tracking lesions across different timepoints for one patient in the data pipeline (lesion centroids marked in yellow, upper row represents the G1 lesion across timepoints and bottom row represents the G2 lesion across timepoints)
  • Figure 3: Box plot the 2D radiomic shape features in log scale based on training cohort across time points stratified by group (ribiociclib in orange and placebo in purple). The median is demarcated by the black line and the lower and upper edges of the boxes correspond to the 25th (Q1) and 75th (Q3) percentiles
  • Figure 4: Cross-validation (CV) paths illustrating the relationship between the regularization parameter $(\lambda)$, the Harrell Concordance Index (C-index), and the number of non-zero coefficients based on training data. Each plot represents a different feature set: the upper left plot considers only Baseline features (Screening), the upper right plot incorporates two time points (Screening and Week 8), the lower left plot includes three time points (Screening, Week 8, and Week 16) and the lower right includes four time points (Screening, Week 8, Week 16, and Week 24). The x-axis denotes log-transformed $\lambda$ values, while the y-axis represents the C-index. The number of non-zero coefficients is annotated at the top of each plot (one marked in red shows the non zero coefficients against the optimum $\lambda$), with red dots indicating the mean C-index across cross-validation folds and error bars representing variability.
  • Figure 5: Performance of shrinkage‐based Cox models on the test dataset across different numbers of longitudinal time points. “C” denotes inclusion of demographic features. Error bars represent the 95% bootstrap confidence intervals, obtained from 1,000 resamples of the test dataset. The figure illustrates the improvement in the C-index as additional time points are incorporated into the model.
  • ...and 3 more figures