Table of Contents
Fetching ...

Multi-modal AI for comprehensive breast cancer prognostication

Jan Witowski, Ken G. Zeng, Joseph Cappadona, Jailan Elayoubi, Khalil Choucair, Elena Diana Chiru, Nancy Chan, Young-Joon Kang, Frederick Howard, Irina Ostrovnaya, Carlos Fernandez-Granda, Freya Schnabel, Zoe Steinsnyder, Ugur Ozerdem, Kangning Liu, Waleed Abdulsattar, Yu Zong, Lina Daoud, Rafic Beydoun, Anas Saad, Nitya Thakore, Mohammad Sadic, Frank Yeung, Elisa Liu, Theodore Hill, Benjamin Swett, Danielle Rigau, Andrew Clayburn, Valerie Speirs, Marcus Vetter, Lina Sojak, Simone Soysal, Daniel Baumhoer, Jia-Wern Pan, Haslina Makmur, Soo-Hwang Teo, Linda Ma Pak, Victor Angel, Dovile Zilenaite-Petrulaitiene, Arvydas Laurinavicius, Natalie Klar, Brian D. Piening, Carlo Bifulco, Sun-Young Jun, Jae Pak Yi, Su Hyun Lim, Adam Brufsky, Francisco J. Esteva, Lajos Pusztai, Yann LeCun, Krzysztof J. Geras

TL;DR

This study presents a multi-modal AI test for breast cancer prognosis that fuses pathology-derived features from a self-supervised vision transformer (Kestrel) with standard clinical variables to predict recurrence and survival. The model ensembles pathology and clinical predictions and is evaluated across 8,161 patients from 15 cohorts, achieving a pooled $C\text{-}index$ of $0.71$ for DFI and a hazard ratio of $3.63$ for high-risk patients, outperforming Oncotype DX in HR+ cohorts. Importantly, the AI test maintains accuracy across major subtypes, including TNBC and HER2+, and provides independent prognostic information beyond existing clinical and genomic markers, suggesting broad applicability and potential to refine treatment decisions. The approach offers faster, cheaper risk stratification using routine H&E slides, with strong translational potential pending prospective validation and integration into clinical workflows.

Abstract

Treatment selection in breast cancer is guided by molecular subtypes and clinical characteristics. However, current tools including genomic assays lack the accuracy required for optimal clinical decision-making. We developed a novel artificial intelligence (AI)-based approach that integrates digital pathology images with clinical data, providing a more robust and effective method for predicting the risk of cancer recurrence in breast cancer patients. Specifically, we utilized a vision transformer pan-cancer foundation model trained with self-supervised learning to extract features from digitized H&E-stained slides. These features were integrated with clinical data to form a multi-modal AI test predicting cancer recurrence and death. The test was developed and evaluated using data from a total of 8,161 female breast cancer patients across 15 cohorts originating from seven countries. Of these, 3,502 patients from five cohorts were used exclusively for evaluation, while the remaining patients were used for training. Our test accurately predicted our primary endpoint, disease-free interval, in the five evaluation cohorts (C-index: 0.71 [0.68-0.75], HR: 3.63 [3.02-4.37, p<0.001]). In a direct comparison (n=858), the AI test was more accurate than Oncotype DX, the standard-of-care 21-gene assay, achieving a C-index of 0.67 [0.61-0.74] versus 0.61 [0.49-0.73], respectively. Additionally, the AI test added independent prognostic information to Oncotype DX in a multivariate analysis (HR: 3.11 [1.91-5.09, p<0.001)]). The test demonstrated robust accuracy across major molecular breast cancer subtypes, including TNBC (C-index: 0.71 [0.62-0.81], HR: 3.81 [2.35-6.17, p=0.02]), where no diagnostic tools are currently recommended by clinical guidelines. These results suggest that our AI test improves upon the accuracy of existing prognostic tests, while being applicable to a wider range of patients.

Multi-modal AI for comprehensive breast cancer prognostication

TL;DR

This study presents a multi-modal AI test for breast cancer prognosis that fuses pathology-derived features from a self-supervised vision transformer (Kestrel) with standard clinical variables to predict recurrence and survival. The model ensembles pathology and clinical predictions and is evaluated across 8,161 patients from 15 cohorts, achieving a pooled of for DFI and a hazard ratio of for high-risk patients, outperforming Oncotype DX in HR+ cohorts. Importantly, the AI test maintains accuracy across major subtypes, including TNBC and HER2+, and provides independent prognostic information beyond existing clinical and genomic markers, suggesting broad applicability and potential to refine treatment decisions. The approach offers faster, cheaper risk stratification using routine H&E slides, with strong translational potential pending prospective validation and integration into clinical workflows.

Abstract

Treatment selection in breast cancer is guided by molecular subtypes and clinical characteristics. However, current tools including genomic assays lack the accuracy required for optimal clinical decision-making. We developed a novel artificial intelligence (AI)-based approach that integrates digital pathology images with clinical data, providing a more robust and effective method for predicting the risk of cancer recurrence in breast cancer patients. Specifically, we utilized a vision transformer pan-cancer foundation model trained with self-supervised learning to extract features from digitized H&E-stained slides. These features were integrated with clinical data to form a multi-modal AI test predicting cancer recurrence and death. The test was developed and evaluated using data from a total of 8,161 female breast cancer patients across 15 cohorts originating from seven countries. Of these, 3,502 patients from five cohorts were used exclusively for evaluation, while the remaining patients were used for training. Our test accurately predicted our primary endpoint, disease-free interval, in the five evaluation cohorts (C-index: 0.71 [0.68-0.75], HR: 3.63 [3.02-4.37, p<0.001]). In a direct comparison (n=858), the AI test was more accurate than Oncotype DX, the standard-of-care 21-gene assay, achieving a C-index of 0.67 [0.61-0.74] versus 0.61 [0.49-0.73], respectively. Additionally, the AI test added independent prognostic information to Oncotype DX in a multivariate analysis (HR: 3.11 [1.91-5.09, p<0.001)]). The test demonstrated robust accuracy across major molecular breast cancer subtypes, including TNBC (C-index: 0.71 [0.62-0.81], HR: 3.81 [2.35-6.17, p=0.02]), where no diagnostic tools are currently recommended by clinical guidelines. These results suggest that our AI test improves upon the accuracy of existing prognostic tests, while being applicable to a wider range of patients.

Paper Structure

This paper contains 41 sections, 12 equations, 12 figures, 6 tables, 2 algorithms.

Figures (12)

  • Figure 1: We present a multi-modal AI test for invasive breast cancer.a, The key component of the test is a system which processes high-resolution digital images of breast cancer specimens. Features are extracted from the digitized slides using Kestrel, a foundation model trained using self-supervised learning on a pan-cancer dataset of 400 million pathology image patches. b, Extracted pathology features and clinical features are used to train supervised time-to-event models predicting breast cancer recurrence or death. c, The AI test produces a multi-modal risk score, integrating pathological and clinical risk scores. d, We developed the AI test using 4,659 patients across 10 cohorts from six countries, and evaluated it on 3,502 patients from five patient cohorts which were not used during training. Evaluation sets consisted of two cohorts (Providence and TCGA) with all invasive breast cancer subtypes, and three cohorts (UChicago, Basel, and Karmanos) with only HR+ HER2 patients tested with Oncotype DX. e, The 10 year recurrence probability increases monotonically as the AI test score increases. f, Patients with predicted high risk had significantly worse outcomes compared to patients with predicted low risk. g, The AI test achieved strong prognostic results across all evaluation datasets. h, In a direct comparison (n=858) to a standard-of-care genomic assay, Oncotype DX, the AI test was a better predictor of cancer recurrence. i, The AI test performs consistently across various endpoints (DFI - disease-free interval, DRFI - distant recurrence-free interval, RFS - recurrence-free survival, DRFS - distant recurrence-free survival, OS overall survival). Endpoint definitions are in Appendix \ref{['appendix:endpoints']}.
  • Figure 2: The AI test enhances the prognostic accuracy over standard-of-care genomic assays for predicting cancer recurrence. Oncotype DX, a standard-of-care 21-gene assay, classifies patients into low-, intermediate-, and high-risk groups. The AI test demonstrated statistically significant discrimination between high- and low-risk patients without the need to introduce an intermediate-risk group, thereby enhancing clarity in decision-making. We analyze the differences in classification and patient outcomes between the two tests using data from the Karmanos, Basel, and UChicago cohorts. a, Comparison of prognostic ability between Oncotype DX and the AI test in univariate models. We compare the hazard ratio for every 0.2 increase in our score, compared to a 20 point increase for Oncotype DX. b, A scatter plot illustrating risk scores for Oncotype DX-tested patients. Each point represents one patient. The majority of patients with intermediate Oncotype DX scores were reclassified into the low-risk group by the AI test. c, For intermediate-risk Oncotype DX patients, the AI test was able to accurately distinguish between low- and high-risk patients (HR 2.84 [1.47-5.49, p=0.002]) d, Hazard ratios associated with common clinical covariates in a multivariate Cox analysis, with and without including the AI test. The AI test is significantly associated with DFI after adjusting for Oncotype DX score, grade (based on the Nottingham grading system, categorized from 1 to 3) and race in a multivariate Cox regression model. e, Comparing the AI test's modalities in a multivariate Cox model shows that the pathology score was more informative than the clinical score within the Oncotype DX cohort (see Table \ref{['tab:cox-path-clin']}).
  • Figure 3: The AI test performs well in all major clinically relevant groups. Forest plots display our AI test's performance across clinical, molecular, and demographic groups. Next to each group name we report the number of disease-free interval events and total number of patients. Subgroup performances were pooled across all evaluation cohorts using a random effects model. For subgroup performance based on histological subtype, we excluded patients from the Providence cohort because histological subtype data were unavailable.
  • Figure 4: The AI test is prognostic in clinically meaningful subgroups. All plots in this figure are for the Providence cohort (the largest evaluation cohort). a, In triple-negative breast cancer, there is an ongoing discussion about potential strategies to de-escalate the KEYNOTE-522 regimen, for example with shorter or anthracycline-free chemotherapy regimens or complete ommitance of adjuvant treatment. While some biomarkers, such as tumor-infiltrating lymphocytes, are associated with long-term outcomes and pathological complete response, there is still a need for a more robust assessment of systemic therapy benefits. b-d, In hormone receptor-positive patients, questions about treatment selection involve the addition of adjuvant chemotherapy, extended endocrine therapy (ET), and new agents, such as CDK4/6 inhibitors. We hypothesize that high-risk HR+ patients who received adjuvant ET alone (b) might benefit from the addition of chemotherapy, and high-risk HR+ patients who received chemoendocrine therapy (c) might benefit from the addition of CDK4/6 inhibitors. Finally, HR+ patients who received five years of adjuvant endocrine therapy (d) are candidates for extended endocrine treatment. A few assays have been shown to be associated with late recurrence and predict the benefit of extended endocrine therapy bartlett2019breast.
  • Figure 5: Relationship between predicted risk of recurrence and established prognostic factors. The top plot analyzes the Providence cohort to display the relationship between the AI test score and the established factors such as ER/HER2 status and staging. The bottom plot combines the Basel, Karmanos and Chicago cohorts and illustrates the Oncotype score in relationship with the AI test score. Higher scores are observed in patients with HR and HER2+ status and in patients with more advanced T or N staging. Higher AI test scores do not appear to be strongly associated with higher Oncotype DX risk, as indicated by the near-zero correlation ($R^2=0.02$) between our AI test scores and the Oncotype DX scores.
  • ...and 7 more figures