AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery

Johann Wenckstern; Eeshaan Jain; Yexiang Cheng; Benedikt von Querfurth; Kiril Vasilev; Matteo Pariset; Phil F. Cheng; Petros Liakopoulos; Olivier Michielin; Andreas Wicki; Gabriele Gut; Charlotte Bunne

AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery

Johann Wenckstern, Eeshaan Jain, Yexiang Cheng, Benedikt von Querfurth, Kiril Vasilev, Matteo Pariset, Phil F. Cheng, Petros Liakopoulos, Olivier Michielin, Andreas Wicki, Gabriele Gut, Charlotte Bunne

TL;DR

VirTues presents a marker-aware, multi-scale foundation approach for spatial proteomics that unifies high-plex tissue measurements across heterogeneous panels. By fusing protein-language embeddings with a proteomics-tailored transformer and a masked autoencoding objective, it learns robust representations at molecule, cell, niche, and tissue scales, enabling zero-shot annotation, cross-cohort biomarker discovery, and clinically relevant predictions. The framework demonstrates cross-dataset generalization, effective tissue retrieval, and transferable spatial biomarkers that outperform traditional panel-specific methods, with demonstrated utility in predicting immunotherapy responses and stratifying patient risk. Together, these results establish a generalizable, interpretable pipeline for translational spatial biology that can adapt to varying marker panels and support panel design, biomarker discovery, and clinical decision support.

Abstract

Spatial proteomics technologies have transformed our understanding of complex tissue architecture in cancer but present unique challenges for computational analysis. Each study uses a different marker panel and protocol, and most methods are tailored to single cohorts, which limits knowledge transfer and robust biomarker discovery. Here we present Virtual Tissues (VirTues), a general-purpose foundation model for spatial proteomics that learns marker-aware, multi-scale representations of proteins, cells, niches and tissues directly from multiplex imaging data. From a single pretrained backbone, VirTues supports marker reconstruction, cell typing and niche annotation, spatial biomarker discovery, and patient stratification, including zero-shot annotation across heterogeneous panels and datasets. In triple-negative breast cancer, VirTues-derived biomarkers predict anti-PD-L1 chemo-immunotherapy response and stratify disease-free survival in an independent cohort, outperforming state-of-the-art biomarkers derived from the same datasets and current clinical stratification schemes.

AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery

TL;DR

Abstract

Paper Structure (23 sections, 11 equations, 5 figures)

This paper contains 23 sections, 11 equations, 5 figures.

Tokenization.
Masking.
VirTues Encoder.
VirTues Decoder.
Aggregation into cell-, niche- and tissue-level representations.
Implementation details.
Loss function.
Data augmentation.
Optimization.
Dataset curation.
Dataset preprocessing.
VirTues model instances.
Comparisons and baselines.
Masked reconstructions.
Cell-level tasks.
...and 8 more sections

Figures (5)

Figure 1: Overview of the Virtual Tissues platform. a, Flow chart depicting VirTues capabilities. VirTues converts highly multiplexed images of tissue to virtual tissue representations useful for clinical and biological investigations at cell, niche and sample level including the retrieval of similar tissue samples for clinical decision support. b, VirTues is trained and evaluated on 15 IMC datasets with a focus on tumors and their micro-environments originating from 8 different organ sites, measuring 147 distinct markers in total. The polar plot depicts used marker panels per dataset. A legend of the dataset color codes is provided in Suppl. Fig. \ref{['suppfig:legend_datasets']}. c, Origins and sizes of datasets in terms of patients, tissue samples, and $256\times256$ image crops. d, Multiplexed images are processed crop-wise into 3D grids of image tokens, representing patches of each marker at each position. Marker tokens, derived from a protein language model, are fused with the respective image tokens using a linear projection and addition. VirTues is a novel vision transformer architecture trained with a masked autoencoding objective. Input tokens are concatenated with patch summary tokens, which are initialized with learnable weights. During inference, VirTues' encoder processes this set of tokens. The encoded patch summaries are subsequently convolved with the cell segmentation mask into cell summary tokens or aggregated to niche and tissue summary tokens. For training, a random subset of tokens is independently selected and masked for each channel. VirTues' decoder predicts channel-wise reconstructions receiving as input the encoded, non-masked tokens from the target channel along with all patch summary tokens. VirTues encoder uses sparse attention mechanisms restricting direct token interactions to either positions (marker attention) or channels (spatial attention). e, Comparison of computational cost (left) and prediction performance (right) between channel-agnostic masked auto-encoder (CA-MAEkraus2024masked) and VirTues as a function of the number of utilized markers.
Figure 2: VirTues learns tissue architecture and marker relationships.a-c, Illustrations of masking strategies and reconstruction examples. Visualized images are scaled to the interval $[0,1]$ by reversing the standardization and log-transformation of the preprocessing and dividing by the 99th percentile. a, Independent masking. Tokens of each channel are masked independently, with a channel-wise random masking ratio ranging from 60% to 100%. b, Marker masking. One marker is chosen and all tokens of its channel are masked while all other channels remain unmasked. In the examples, each row depicts the inpainting of different channels from the same tissue sample. c, Niche masking. A subset of spatial positions is chosen and all tokens across markers are masked at these positions. Each row depicts the niche reconstructions of different channels of the same tissue sample. d, Dataset-wise reconstruction performance quantified by Pearson correlation and averaged across markers for independent (round), marker (cross) and niche (triangle) masking. For comparison, light gray squares represent the correlation obtained by predicting the mean channel intensity of visible pixels for all masked pixels under independent masking. Dark gray diamonds indicate the correlation achieved by predicting for each marker the highest correlated other marker. Contrary to the training objective, the reconstruction performance is assessed only on masked tokens. A legend of the dataset color codes is provided in Supp. Fig. \ref{['suppfig:legend_datasets']}. e, VirTues' architecture enables the generation of virtual tissue representations for multiplexed images from new datasets, including those with markers not observed during pretraining. These representations support a variety of reconstruction and prediction tasks. f, UMAP embeddings of all marker tokens derived from ESM-2. Markers, included in the panel of rigamonti2024integrating are displayed in blue, all others in light-gray. Markers seen during pretraining, when training without rigamonti2024integrating, are depicted as round, while new markers unique to rigamonti2024integrating are shown as crosses. g, Examples of zero-shot reconstructions for markers CD14 and CD63 from rigamonti2024integrating, where CD14 was observed and CD63 unobserved during pretraining. For comparison, reconstructions of the same samples from a model additionally pretrained on rigamonti2024integrating are shown. h, Comparison of marker-wise reconstruction performance in the zero-shot and non-zero-shot setting on rigamonti2024integrating. Markers are grouped into those seen during pretraining (left) and unseen ones (right). Dashed lines indicate the averages across markers using marker masking.
Figure 3: Evaluation of VirTues' cell-level representations.a, Cell types are classified with logistic regression using individual cell summary tokens. b, F1-scores for cell type classification for VirTues, KRONOSshaban2025foundation and CA-MAEkraus2024masked on cords2024cancer, wang2023spatial, hoch2022multiplexed and danenberg2022breast. c, Comparison of F1-scores for cell type classifications between VirTues trained an all pretraining datasets and VirTues trained only on danenberg2022breast. d, Comparison of cell type classification performance in the zero-shot and the non-zero-shot setting measured by macro-averaged F1-score. Results are shows from left to right for cords2024cancer (coarse cell types), wang2023spatial (coarse cell types), hoch2022multiplexed and danenberg2022breast. e, Virtual Tissues facilitates the supervised transfer of annotations between datasets. We train a Random Forest classifier on the cell summary tokens of cords2024cancer to predict cell types using available labels. Subsequently, we employ the trained classifier to annotate cell summary tokens of rigamonti2024integrating computed using the shared markers. f, Examples of a cell type masks generated using transferred labels. We compare to the transfer results using KRONOSshaban2025foundation' cell-level representations (left) and ground truth (right). g, F1-scores for the transfer of cell types for VirTues, KRONOSshaban2025foundation and CA-MAEkraus2024masked and mean abundances of the shared markers.
Figure 4: Clinical applications of VirTues: risk stratification, diagnostic predictions and tissue retrieval by similarity.a-c, VirTues enables TME-based stratification of patients into risk groups. a, Stratification protocol. Cell-level representations of ER-positive patients from the METABRIC cohort curtis2012genomicdanenberg2022breast are computed and clustered using k-means. For each patient, a cluster proportion vector is calculated. These vectors are then clustered again via k-means to define patient groups, which are subsequently assigned risk levels. b, UMAP embeddings of cell summary tokens, colored by the assigned patient risk level. Overlaid KDE plots show the distributions of cells from two TME structures, (1) vascular stroma and (2) APC-enriched, as previously reported by danenberg2022breast to be associated with decreased and increased hazard ratios, respectively. c, Kaplan-Meier survival curves of identified high-risk and low-risk groups. We report the p-value of a log-rank test comparing the two curves. d, Risk ratio of each TME structure’s occurrence in the high risk group. Lines indicate the 95% confidence intervals. e, We predict clinical patient features from the patch summary tokens of the entire tissue using ABMIL. f, Macro-averaged F1-scores for tissue-level prediction tasks on cords2024cancer, danenberg2022breast and wang2023spatial. For the response prediction on wang2023spatial, we test performance separately for biopsy samples collected before, during or after chemo- and immunotherapy. g, VirTues enables data-driven clinical decision support by retrieving similar patient cases from a database of tissue representations based on VirTues niche summary tokens and an optimal transport-based retrieval system. h-j, Comparison of retrieval statistics evaluated on cords2024cancer using the niche representations of VirTues, KRONOSshaban2025foundation, CA-MAEkraus2024masked and ResNetsorin2023single. The red dotted lines indicate the scores achieved by uniformly random retrieval for reference. h, Mean precision of the top-3 results for the retrieval of four clinical labels: cancer subtype, grade, presence of lymph node metastasis and relapse. P-values above the bars are computed based on a McNemar test for each clinical label and indicate the number of hits achieved by VirTues compared to a random retrieval. i, Average cell type composition similarity between query and closest match quantified by the L1 distance between the cell type proportion vectors. j, Average molecular composition similarity between query and closest match measured by the sliced Wasserstein distance between the pixel-sized marker intensity vectors. k, Exemplary retrieval results for tissues in cords2024cancer. Each column shows the query tissue followed by the three closest matches. Tissues are depicted using their color-coded cell type masks. Colorbars next to the tissues indicate their proportional cell type compositions.
Figure 5: Discovery of FM-based signatures predictive for treatment response and survival.a, Overview of the NeoTRIP cohortgianni2022pathologic imaged by wang2023spatial. The study included 138 breast cancer patients who received chemo-immunotherapy, of whom 67 showed a complete pathological response. For each patient, samples were collected at up to three time points: before, during, and after treatment. b, Distributions of cell states across treatment stages, shown separately for non-responders and responders. Distributions are visualized as kernel density estimates over the UMAP embeddings of VirTues' cell-level representations (left) and mean marker abundances (right). Arrows indicate the average distribution shifts between stages. c, Cell-level representations of pre-treatment samples are iteratively clustered using Leiden at different resolutions. Predictive values of the obtained clusters are evaluated individually. Subsequently, the four most predictive clusters are selected as signatures for joint prediction of treatment response. d, UMAP embeddings of pre-treatment cell-level representations colored by cell type. Dashed contour lines indicate distributions of selected response signatures (RS) and non-response signatures (NRS). e, Cross-validated AUROC scores for response prediction using the identified signatures on pre-treatment samples. Performance is compared with the spatial predictor system of wang2023spatial and with three univariate baselines: the ratios of tumor cells to CD4 T cells, CD8 T cells, and B cells. Grey brackets indicate the significance of the improvement of VirTues over the second-best stratification system, as validated by an independent t-test ($P<0.001$). f, Cell type composition of the selected response and non-response signatures, along overall cell type proportions. g, Comparison of neighborhood densities of selected cell types inside and outside the response signatures (RS). Tree diagrams show, from left to right, neighborhood composition of apoptotic cells inside versus outside RS1, CD4 T cells inside versus outside RS2, and B / Plasma cells inside versus outside RS1. Black arrows indicate densities within the signature, and gray arrows indicate densities outside the signature. Only the five most frequent neighboring cell types are shown. A solid arc indicates a significant difference between inside and outside the signature ($P<0.05$ for proportions z-test), while a dashed arc indicates a non-significant difference ($P>0.05$). Cell type masks visualize representative example niches. h, VirTues' representations permit the transfer of each identified predictive signature using a Random Forest classifier to a new breast cancer cohort. i, Proportions of cells with transferred signatures belonging to immune-inflamed, excluded and cold tumors. j, Risk stratification based on transferred signature proportions per patient (left) and tumor-to-CD4 T cell ratio (right). Kaplan-Meier survival curves are shown for each group, with the p-values from a log-rank test comparing the high- and low-risk groups indicated. k, Concordance index of cluster-derived risk groups compared against risk groups identified by meyer2025stratification and three baselines, namely risk scores derived from tertile-transformed ratios of tumor cells to CD4 T cells, CD8 T cells, and B cells.

AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery

TL;DR

Abstract

AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (5)