Table of Contents
Fetching ...

BRIGHT: A Collaborative Generalist-Specialist Foundation Model for Breast Pathology

Xiaojing Guo, Jiatai Lin, Yumian Jia, Jingqi Huang, Zeyan Xu, Weidong Li, Longfei Wang, Jingjing Chen, Qin Li, Weiwei Wang, Lifang Cui, Wen Yue, Zhiqiang Cheng, Xiaolong Wei, Jianzhong Yu, Xia Jin, Baizhou Li, Honghong Shen, Jing Li, Chunlan Li, Yanfen Cui, Yi Dai, Yiling Yang, Xiaolong Qian, Liu Yang, Yang Yang, Guangshen Gao, Yaqing Li, Lili Zhai, Chenying Liu, Tianhua Zhang, Zhenwei Shi, Cheng Lu, Xingchen Zhou, Jing Xu, Miaoqing Zhao, Fang Mei, Jiaojiao Zhou, Ning Mao, Fangfang Liu, Chu Han, Zaiyi Liu

TL;DR

BRIGHT is proposed, the first PFM specifically designed for breast pathology, trained on approximately 210 million histopathology tiles from over 51,000 breast whole-slide images derived from a cohort of over 40,000 patients across 19 hospitals, providing a scalable template for developing PFMs on a specific organ system.

Abstract

Generalist pathology foundation models (PFMs), pretrained on large-scale multi-organ datasets, have demonstrated remarkable predictive capabilities across diverse clinical applications. However, their proficiency on the full spectrum of clinically essential tasks within a specific organ system remains an open question due to the lack of large-scale validation cohorts for a single organ as well as the absence of a tailored training paradigm that can effectively translate broad histomorphological knowledge into the organ-specific expertise required for specialist-level interpretation. In this study, we propose BRIGHT, the first PFM specifically designed for breast pathology, trained on approximately 210 million histopathology tiles from over 51,000 breast whole-slide images derived from a cohort of over 40,000 patients across 19 hospitals. BRIGHT employs a collaborative generalist-specialist framework to capture both universal and organ-specific features. To comprehensively evaluate the performance of PFMs on breast oncology, we curate the largest multi-institutional cohorts to date for downstream task development and evaluation, comprising over 25,000 WSIs across 10 hospitals. The validation cohorts cover the full spectrum of breast pathology across 24 distinct clinical tasks spanning diagnosis, biomarker prediction, treatment response and survival prediction. Extensive experiments demonstrate that BRIGHT outperforms three leading generalist PFMs, achieving state-of-the-art (SOTA) performance in 21 of 24 internal validation tasks and in 5 of 10 external validation tasks with excellent heatmap interpretability. By evaluating on large-scale validation cohorts, this study not only demonstrates BRIGHT's clinical utility in breast oncology but also validates a collaborative generalist-specialist paradigm, providing a scalable template for developing PFMs on a specific organ system.

BRIGHT: A Collaborative Generalist-Specialist Foundation Model for Breast Pathology

TL;DR

BRIGHT is proposed, the first PFM specifically designed for breast pathology, trained on approximately 210 million histopathology tiles from over 51,000 breast whole-slide images derived from a cohort of over 40,000 patients across 19 hospitals, providing a scalable template for developing PFMs on a specific organ system.

Abstract

Generalist pathology foundation models (PFMs), pretrained on large-scale multi-organ datasets, have demonstrated remarkable predictive capabilities across diverse clinical applications. However, their proficiency on the full spectrum of clinically essential tasks within a specific organ system remains an open question due to the lack of large-scale validation cohorts for a single organ as well as the absence of a tailored training paradigm that can effectively translate broad histomorphological knowledge into the organ-specific expertise required for specialist-level interpretation. In this study, we propose BRIGHT, the first PFM specifically designed for breast pathology, trained on approximately 210 million histopathology tiles from over 51,000 breast whole-slide images derived from a cohort of over 40,000 patients across 19 hospitals. BRIGHT employs a collaborative generalist-specialist framework to capture both universal and organ-specific features. To comprehensively evaluate the performance of PFMs on breast oncology, we curate the largest multi-institutional cohorts to date for downstream task development and evaluation, comprising over 25,000 WSIs across 10 hospitals. The validation cohorts cover the full spectrum of breast pathology across 24 distinct clinical tasks spanning diagnosis, biomarker prediction, treatment response and survival prediction. Extensive experiments demonstrate that BRIGHT outperforms three leading generalist PFMs, achieving state-of-the-art (SOTA) performance in 21 of 24 internal validation tasks and in 5 of 10 external validation tasks with excellent heatmap interpretability. By evaluating on large-scale validation cohorts, this study not only demonstrates BRIGHT's clinical utility in breast oncology but also validates a collaborative generalist-specialist paradigm, providing a scalable template for developing PFMs on a specific organ system.
Paper Structure (21 sections, 5 figures)

This paper contains 21 sections, 5 figures.

Figures (5)

  • Figure 1: Overview of BRIGHT study.a, Schematic of the clinical spectrum covered by the study, encompassing diagnostic subtyping, molecular biomarker prediction, treatment response assessment, and long-term survival prediction in breast oncology. b, Distribution of datasets used for the development and evaluation of each downstream clinical prediction task. BRIGHT covers wider clinical tasks and more benchmarking data than existing benchmarking studies. c, Architecture of BRIGHT. A leading generalist pathology foundation model (PFM, Virchow2) is specialized for breast pathology via Low-Rank Adaptation (LoRA), yielding a specialist PFM (BRIGHT (S)). The final BRIGHT model is formed by the collaborative integration of feature embeddings from both generalist and specialist encoders. d, BRIGHT generally outperforms three leading PFMs and our specialist model BRIGHT (S) in 21 internal validation benchmarks. e,f, Model ranking summary. The number of tasks (among 24 internal (e) and 10 external (f) benchmarks) in which each foundation model achieved top-1 (left bars) or top-2 (right bars) performance based on AUROC for classification tasks and C-index for survival prediction tasks.
  • Figure 2: Performance of PFMs on diagnostic tasks of breast pathology.a, Distribution of the downstream model development, internal test and external test datasets. The number in brackets is the number of hospitals for external validation. b, Model performance (AUROC) across individual diagnostic tasks on the internal validation set. The pie charts below indicate the total number of WSIs and the class distribution for each task. The values in brackets indicate the numbers of categories. c, Receiver operating characteristic (ROC) curves for the breast cancer detection task (breast cancer vs. non-cancerous/other conditions). d-f, Detailed analysis of the 10-class histological diagnosis task: confusion matrix (d), weighted F1-scores across models (e), and top-n accuracy (f). g, Model performance (AUROC) on two diagnostic tasks across the combined external validation cohorts. h, Representative heatmap visualizations of BRIGHT on three different histological subtypes. The error bars denote the two-sided 95% confidence interval computed via 1,000 bootstrap resampling. Can. Det., Cancer detection. Histo. Diag., Histological diagnosis.
  • Figure 3: Performance of PFMs on molecular subtyping and biomarker prediction tasks of breast pathology.a, Distribution of the downstream model development, internal test and external test datasets. b, Model performance (AUROC) across individual tasks on the internal validation set. The values in brackets indicate the numbers of categories. c, Model performance (AUROC) on four biomarker prediction tasks and the molecular subtype prediction task across the combined external validation cohorts. d, Potential reduction in immunohistochemistry (IHC) testing, estimated by applying dual clinical thresholds, negative predictive value (NPV) and positive predictive value (PPV), to model predictions. For a given biomarker, samples with a BRIGHT-predicted probability below a predefined NPV threshold are considered true negatives, while those above a PPV threshold are considered true positives. IHC assays for these samples could be potentially waived. The proportion of such samples across the cohort yields the potential IHC assay reduction rate at those specific NPV/PPV thresholds (plotted from 1.0 to 0.95). e, Heatmaps generated by BRIGHT visualize the model's spatial attention when predicting key biomarkers (ER, PR, HER2, Ki-67) across the four major molecular subtypes (Luminal A, Luminal B, HER2-enriched, Triple-negative) of breast cancer. For each comparison, the corresponding diagnostic immunohistochemistry (IHC) image is shown in the top-right inset for direct visual comparison. Shaded areas and error bars denote the 95% confidence interval computed via 1,000 bootstrap resampling.
  • Figure 4: Performance of PFMs in predicting breast cancer treatment response.a, Distribution of the downstream model development, internal test and external test datasets. b-c, Model performance (AUROC) across individual neoadjuvant therapy (NAT) response prediction tasks on the internal (b) and external validation sets (c). The values in brackets indicate the numbers of categories. d, Performance (AUROC) of the models in predicting response to neoadjuvant immunotherapy for triple-negative breast cancer (TNBC) patients, evaluated using five-fold cross-validation. e, Heatmaps generated by BRIGHT for a patient who responded to immune checkpoint blockade (ICB; left) and a non-responder (right). The model’s attention is prominently localized to regions rich in tumor-infiltrating lymphocytes (TILs), aligning with known immunological correlates of treatment response. Error bars denote the 95% confidence interval computed via 1,000 bootstrap resampling.
  • Figure 5: Performance of PFMs in prognostic prediction.a, Box plots comparing the C-index of BRIGHT with PFMs across three experimental settings: C26.TCGA-BRCA overall survival (OS), C26.TCGA-BRCA disease-free survival (DFS), and OS in the TNBC cohort of the C1.TMUCIH. Each box summarizes the C-index values obtained from all validation folds in 10-fold cross-validation. b, Kaplan-Meier survival curves for OS and DFS based on BRIGHT-derived risk stratification in the validation sets. Patients were stratified into high-risk (H, red) and low-risk (L, blue) groups according to the median risk score calculated from the training set. Statistical significance was assessed using a two-sided log-rank test. c, Forest plots of multivariable Cox proportional hazards regression analyses for OS or DFS, showing that BRIGHT risk group (H v L) is a statistically significant predictor.