Enhancing predictive imaging biomarker discovery through treatment effect analysis

Shuhan Xiao; Lukas Klein; Jens Petersen; Philipp Vollmuth; Paul F. Jaeger; Klaus H. Maier-Hein

Enhancing predictive imaging biomarker discovery through treatment effect analysis

Shuhan Xiao, Lukas Klein, Jens Petersen, Philipp Vollmuth, Paul F. Jaeger, Klaus H. Maier-Hein

TL;DR

The paper defines and tackles the problem of discovering predictive imaging biomarkers directly from pre-treatment images by framing it within a causal, conditional average treatment effect (CATE) paradigm. It introduces an image-based, two-headed TARNet–like estimator to capture treatment effect heterogeneity and outlines a dual evaluation protocol: statistical testing of biomarker–treatment interactions and attribution-based interpretation to verify predictive vs. prognostic roles. Through semi-synthetic experiments across four diverse datasets, the study demonstrates that the proposed approach can identify predictive imaging biomarkers and quantify their strength relative to prognostic effects, with qualitative XAI analyses providing insight into the contributing image features. While promising, the work notes limitations related to linear biomarker–outcome relations, semi-synthetic data, and dataset-specific challenges, and it outlines future directions for non-linear modeling, survival outcomes, and handling confounding in observational data to broaden applicability and robustness.

Abstract

Identifying predictive covariates, which forecast individual treatment effectiveness, is crucial for decision-making across different disciplines such as personalized medicine. These covariates, referred to as biomarkers, are extracted from pre-treatment data, often within randomized controlled trials, and should be distinguished from prognostic biomarkers, which are independent of treatment assignment. Our study focuses on discovering predictive imaging biomarkers, specific image features, by leveraging pre-treatment images to uncover new causal relationships. Unlike labor-intensive approaches relying on handcrafted features prone to bias, we present a novel task of directly learning predictive features from images. We propose an evaluation protocol to assess a model's ability to identify predictive imaging biomarkers and differentiate them from purely prognostic ones by employing statistical testing and a comprehensive analysis of image feature attribution. We explore the suitability of deep learning models originally developed for estimating the conditional average treatment effect (CATE) for this task, which have been assessed primarily for their precision of CATE estimation while overlooking the evaluation of imaging biomarker discovery. Our proof-of-concept analysis demonstrates the feasibility and potential of our approach in discovering and validating predictive imaging biomarkers from synthetic outcomes and real-world image datasets. Our code is available at \url{https://github.com/MIC-DKFZ/predictive_image_biomarker_analysis}.

Enhancing predictive imaging biomarker discovery through treatment effect analysis

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 5 figures)

This paper contains 16 sections, 4 equations, 5 figures.

Introduction
Methods
Treatment heterogeneity and predictive biomarkers
Image-based treatment effect estimator
Proposed evaluation protocol
Statistical evaluation of the predictive strength
Interpretation using feature attribution methods
Simulation of imaging biomarkers and outcomes for validation
Experimental Setup
Datasets and imaging biomarker features
Results
Predictive strength of the estimated CATE
Interpreting predictive imaging biomarkers
Discussion
Conclusion
...and 1 more sections

Figures (5)

Figure 1: Relationship between biomarkers $x_{\mathit{prog}}$ and $x_{\mathit{pred}}$, outcomes $Y(T)$ depending on the treatment $T$ and the treatment effect $\tau$. Since both potential outcomes $Y_i(T=0)$ and $Y_i(T=1)$ cannot be observed for the same individual simultaneously it is impossible to infer the individual treatment effect directly.
Figure 2: Overview of the identification of predictive biomarkers from pre-treatment images. The (a) training and (b) inference step employs a two-headed architecture to estimate treatment effects $\hat{\tau}$ from images. In the evaluation step (c) the predictive strength of the estimated $\hat{\tau}$, the predictive biomarker candidate, is assessed using regression. In our simulation experiments (d), the outcome data $Y_i$ used in our experiments are simulated with image features from ground truth annotations and randomly assigned treatments $T$.
Figure 3: Image features from the four datasets, where either feature 1 or 2 is designated as predictive or prognostic biomarkers. ISIC 2018 skin lesion features are shown with ground truth masks. Globules (light green mask) manifest as darker dots, pigment networks have dark grid-like patterns of streaks with lighter "holes" (dark blue mask). The NSCLC-Radiomics images display tumor segmentation outlines of a 2D slice (left) or corresponding 3D volumes (right). Examples on the bottom row depict images where both biomarkers are either absent or have a low value.
Figure 4: Model performance based on the relative predictive strength $t_{\mathit{pred}}/t_{\mathit{prog}}$ of the CATE, shown on a logarithmic scale. We compare our two-headed CATE estimator with a one-headed baseline model across different simulation parameters $b_{\mathit{pred}}/b_{\mathit{prog}}$ (i.e. relative size of the predictive effect in the simulated outcomes). Boxplots summarize data averaged over $b_{\mathit{pred}}/b_{\mathit{prog}}$-bin widths, indicated by the horizontal error bars over the median line. Rows (a) and (b) correspond to different sets of prognostic and predictive features used for generating the data (see \ref{['sec:datasets']} and Fig. \ref{['fig:biomarkers']}). The variance of the boxplots is affected by the differing number of samples each bin contains.
Figure 5: Attribution maps for the control group prediction head (last row) and the predicted CATE output (middle row) for different example images from each dataset (top row). For the CMNIST dataset, the attribution is shown for each RGB color channel (red: left, green: top, blue: right), as the color information is important for the biomarker prediction. An additional zoomed-in patch of the ISIC 2018 attribution map is overlaid with a grayscale version of the original image. For the NSCLC-Radiomics dataset, sagittal slices of the 3D patches are shown with orange outlines of segmented tumors. Here, results are based on models trained with $b_{\mathit{pred}},b_{\mathit{prog}} = 1.0$.

Enhancing predictive imaging biomarker discovery through treatment effect analysis

TL;DR

Abstract

Enhancing predictive imaging biomarker discovery through treatment effect analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (5)