Table of Contents
Fetching ...

MAD: Microenvironment-Aware Distillation -- A Pretraining Strategy for Virtual Spatial Omics from Microscopy

Jiashu Han, Kunzan Liu, Yeojin Kim, Saurabh Sinha, Sixian You

Abstract

Bridging microscopy and omics would allow us to read molecular states from images-at single-cell resolution and tissue scale-without the cost and throughput limits of omics technologies. Self-supervised pretraining offers a scalable approach with minimal labels, yet how to encode single-cell identity within tissue environments-and the extent of biological information such models can capture-remains an open question. Here, we introduce MAD (microenvironment-aware distillation), a pretraining strategy that learns cell-centric embeddings by jointly self-distilling the morphology view and the microenvironment view of the same indexed cell into a unified embedding space. Across diverse tissues and imaging modalities, MAD achieves state-of-the-art prediction performance on downstream tasks including cell subtyping, transcriptomic prediction, and bioinformatic inference. MAD even outperforms foundation models with a similar number of model parameters that have been trained on substantially larger datasets. These results demonstrate that MAD's dual-view joint self-distillation effectively captures the complexity and diversity of cells within tissues. Together, this establishes MAD as a general tool for representation learning in microscopy, enabling virtual spatial omics and biological insights from vast microscopy datasets.

MAD: Microenvironment-Aware Distillation -- A Pretraining Strategy for Virtual Spatial Omics from Microscopy

Abstract

Bridging microscopy and omics would allow us to read molecular states from images-at single-cell resolution and tissue scale-without the cost and throughput limits of omics technologies. Self-supervised pretraining offers a scalable approach with minimal labels, yet how to encode single-cell identity within tissue environments-and the extent of biological information such models can capture-remains an open question. Here, we introduce MAD (microenvironment-aware distillation), a pretraining strategy that learns cell-centric embeddings by jointly self-distilling the morphology view and the microenvironment view of the same indexed cell into a unified embedding space. Across diverse tissues and imaging modalities, MAD achieves state-of-the-art prediction performance on downstream tasks including cell subtyping, transcriptomic prediction, and bioinformatic inference. MAD even outperforms foundation models with a similar number of model parameters that have been trained on substantially larger datasets. These results demonstrate that MAD's dual-view joint self-distillation effectively captures the complexity and diversity of cells within tissues. Together, this establishes MAD as a general tool for representation learning in microscopy, enabling virtual spatial omics and biological insights from vast microscopy datasets.
Paper Structure (2 sections, 11 equations, 6 figures)

This paper contains 2 sections, 11 equations, 6 figures.

Table of Contents

  1. Introduction
  2. Results

Figures (6)

  • Figure 1: Principles of pretraining with MAD (Microenvironment-Aware Distillation).a, Example tissue image with an indexed single cell highlighted; the zoom-in illustrates the cell’s subcellular morphology together with surrounding tissue context. b, The dataset is prepared with morphological and microenvironmental images, from which multiple paired global and local crops are generated; the global crops are sent to the teacher network and the local crops to the student network. c, The output class tokens are aligned with a dual-view joint self-distillation objective to match the teacher (global) and student (local) representations across morphological and microenvironmental views.d, After pretraining, the single-cell MAD representation is obtained by concatenating the output class tokens from the morphological and microenvironmental views into a unified feature vector. e, The resulting MAD embeddings can be integrated with task-specific decoders to support downstream analyses, including prediction of cell type/state, prediction of gene expression, bioinformatic inference, and unsupervised cell subtyping.
  • Figure 2: MAD effectively integrates morphological and microenvironmental information to define cellular identities.a–c, UMAP projections of single-cell embeddings learned from morphology-only views, microenvironment-only views, and joint morphology-and-microenvironment (MAD) pretraining, respectively. Points are colored by annotated cell types; the adjusted Rand index (ARI) is shown for each embedding. d,e, Training dynamics showing self-distillation loss and recall over training iterations, respectively. f–h, The same UMAPs as in a–c, with representative cells from three macrophage subtypes highlighted. i–k, Example image crops corresponding to the highlighted cells for each subtype in f–h. Scale bars: 10 µ m.
  • Figure 3: Benchmarking MAD on cell subtyping.a–f, Representative images from six benchmark datasets spanning cell culture (top row) and tissue slides (bottom row), showing Human Protein Atlas, Cell Painting (LINCS), Cell Painting (JUMP), human lung cancer, human lymph node, and human ovarian cancer, respectively. g, Summary of cell-subtype prediction accuracy across datasets (radar plot). Above dashed line: cell culture datasets. Below dashed line: tissue slide datasets. h, Four-color fluorescence imaging from the human ovarian cancer tissue dataset. i–k, Spatial maps and corresponding zoomed-in regions of cell-type assignments for the same ovarian cancer section, showing annotations derived from Xenium, MAD, and CellDINO, respectively. l–n, Zoomed-in comparisons of the spatial relationships between macrophages (red) and three tumor cell subtypes: SOX2-OT+ tumor cells, VEGFA+ tumor cells, and proliferative tumor cells, respectively. Scale bars: 500 µ m.
  • Figure 4: MAD enables prediction of gene expression profiles at single-cell resolution.a, Per-gene Pearson correlation comparison between the benchmark method CellDINO (x-axis) and MAD (y-axis). b, Per-gene Pearson correlation across a panel of 126 marker genes, sorted by MAD performance (highest to lowest); bars represent MAD and lines represent the benchmark method CellDINO. See Supplementary Note 1 for the complete gene panel. c,d, Spatial expression maps for LUM and C11orf96, shown for CellDINO, MAD, and Xenium measurements, with corresponding zoomed-in regions. The color bar indicates normalized gene expression levels. e, Comparison of mean squared error (MSE) of gene expression prediction (N = 126 genes) across different levels of within-patch heterogeneity. f,g, Representative examples of low and high within-patch gene expression variation, respectively. h,i, Comparison of differential gene expression analysis derived from MAD predictions and measurements from Xenium, respectively. Scale bars: 10 µ m.
  • Figure 5: Image-derived MAD embeddings recapitulate the tissue transcriptomic manifold.a–c, Using the human ovarian cancer dataset, UMAPs of embeddings colored by annotated cell types from VQ-VAE, MAD, and spatial transcriptomics from Xenium, respectively. d,e, Cell-type centroid alignment between embedding and spatial transcriptomic spaces measured by canonical correlation analysis (CCA) for VQ-VAE and MAD. f–i, Benchmark metrics comparing VQ-VAE, morphology-only, microenvironment-only, and MAD: ARI, linear classification accuracy, kNN classification accuracy, and cluster purity, respectively. j, MAD embedding UMAP of fibroblasts with stromal-associated and tumor-associated fibroblasts indicated. k, Louvain clustering on the MAD embedding. l, Selected Louvain cluster 12 and cluster 4 highlighted in MAD embedding space. m, The same cells highlighted in spatial transcriptomic embedding space. n, Spatial mapping of the selected clusters back onto the tissue section. o,q, Volcano plots for differential expression of Louvain cluster 12 vs. stromal-associated fibroblasts and Louvain cluster 4 vs. tumor-associated fibroblasts. p,r, Gene set enrichment results for the corresponding comparisons.
  • ...and 1 more figures