Table of Contents
Fetching ...

PixCell: A generative foundation model for digital histopathology images

Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Tarak Nath Nandi, Karen Bai, Beatrice S. Knudsen, Tahsin Kurc, Rajarsi R. Gupta, Prateek Prasanna, Ravi K Madduri, Joel Saltz, Dimitris Samaras

TL;DR

<3-5 sentence high-level summary> PixCell tackles the core challenges of histopathology data scarcity and privacy by introducing a diffusion-based generative foundation model trained on PanCan-30M, with progressive, SSL-embedding-conditioned training to synthesize high-quality, semantically faithful H&E patches. The model enables data augmentation that boosts downstream classification, supports privacy-preserving data sharing through synthetic data, and extends to zero-shot virtual staining (H&E→IHC) via embedding translation and lightweight adapters. Across extensive evaluations, PixCell delivers superior image realism (low Fréchet distances across pathology encoders), preserves tissue semantics, and achieves diagnostically relevant performance on synthetic images, including BRCA subtyping accuracy. The work also demonstrates practical applications such as synthetic SSL pretraining, synthetic-data pooling for multi-institution learning, and an open-release of synthetic data and model weights to accelerate computational pathology research.

Abstract

The digitization of histology slides has revolutionized pathology, providing massive datasets for cancer diagnosis and research. Self-supervised and vision-language models have been shown to effectively mine large pathology datasets to learn discriminative representations. On the other hand, there are unique problems in pathology, such as annotated data scarcity, privacy regulations in data sharing, and inherently generative tasks like virtual staining. Generative models, capable of synthesizing realistic and diverse images, present a compelling solution to address these problems through image synthesis. We introduce PixCell, the first generative foundation model for histopathology images. PixCell is a diffusion model trained on PanCan-30M, a large, diverse dataset derived from 69,184 H&E-stained whole slide images of various cancer types. We employ a progressive training strategy and a self-supervision-based conditioning that allows us to scale up training without any human-annotated data. By conditioning on real slides, the synthetic images capture the properties of the real data and can be used as data augmentation for small-scale datasets to boost classification performance. We prove the foundational versatility of PixCell by applying it to two generative downstream tasks: privacy-preserving synthetic data generation and virtual IHC staining. PixCell's high-fidelity conditional generation enables institutions to use their private data to synthesize highly realistic, site-specific surrogate images that can be shared in place of raw patient data. Furthermore, using datasets of roughly paired H&E-IHC tiles, we learn to translate PixCell's conditioning from H&E to multiple IHC stains, allowing the generation of IHC images from H&E inputs. Our trained models are publicly released to accelerate research in computational pathology.

PixCell: A generative foundation model for digital histopathology images

TL;DR

<3-5 sentence high-level summary> PixCell tackles the core challenges of histopathology data scarcity and privacy by introducing a diffusion-based generative foundation model trained on PanCan-30M, with progressive, SSL-embedding-conditioned training to synthesize high-quality, semantically faithful H&E patches. The model enables data augmentation that boosts downstream classification, supports privacy-preserving data sharing through synthetic data, and extends to zero-shot virtual staining (H&E→IHC) via embedding translation and lightweight adapters. Across extensive evaluations, PixCell delivers superior image realism (low Fréchet distances across pathology encoders), preserves tissue semantics, and achieves diagnostically relevant performance on synthetic images, including BRCA subtyping accuracy. The work also demonstrates practical applications such as synthetic SSL pretraining, synthetic-data pooling for multi-institution learning, and an open-release of synthetic data and model weights to accelerate computational pathology research.

Abstract

The digitization of histology slides has revolutionized pathology, providing massive datasets for cancer diagnosis and research. Self-supervised and vision-language models have been shown to effectively mine large pathology datasets to learn discriminative representations. On the other hand, there are unique problems in pathology, such as annotated data scarcity, privacy regulations in data sharing, and inherently generative tasks like virtual staining. Generative models, capable of synthesizing realistic and diverse images, present a compelling solution to address these problems through image synthesis. We introduce PixCell, the first generative foundation model for histopathology images. PixCell is a diffusion model trained on PanCan-30M, a large, diverse dataset derived from 69,184 H&E-stained whole slide images of various cancer types. We employ a progressive training strategy and a self-supervision-based conditioning that allows us to scale up training without any human-annotated data. By conditioning on real slides, the synthetic images capture the properties of the real data and can be used as data augmentation for small-scale datasets to boost classification performance. We prove the foundational versatility of PixCell by applying it to two generative downstream tasks: privacy-preserving synthetic data generation and virtual IHC staining. PixCell's high-fidelity conditional generation enables institutions to use their private data to synthesize highly realistic, site-specific surrogate images that can be shared in place of raw patient data. Furthermore, using datasets of roughly paired H&E-IHC tiles, we learn to translate PixCell's conditioning from H&E to multiple IHC stains, allowing the generation of IHC images from H&E inputs. Our trained models are publicly released to accelerate research in computational pathology.

Paper Structure

This paper contains 49 sections, 20 figures, 11 tables.

Figures (20)

  • Figure 1: PixCell overview.a. Our training data comprises a large collection of 69,184 WSIs spanning 28 tissue types. b. We progressively increase image resolution during training: We start by training the model on 256x256 images, conditioning the generation process on the image embedding extracted from a pretrained pathology foundation model (Stage 1). We continue training the same model with 512x512 (Stage 2) and 1024x1024 images (Stage 3), using embeddings from all 256x256 tiles contained in the larger image. c. PixCell generates images that preserve key features of the reference tiles and are perceived as highly similar by pre-trained pathology image encoders. Using synthetic images for data augmentation with these images improves the performance of downstream classifiers. Using an inference-time algorithm, we can scale the generated images to 4096x4096 pixels. d. Synthetic images can serve as a drop-in replacement for real data in the training of self-supervised foundation models. This enables privacy-preserving data sharing between institutions. e. Using PixCell with small datasets of paired H&E and IHC images enables virtual staining. Our virtual staining pipeline leads to higher accuracy in the diagnostic labels predicted by both automated and human evaluators. Figure best viewed in $3\times$ magnification.
  • Figure 2: a. PixCell consistently achieves lower (better) Fréchet Distance scores across multiple datasets and pathology-specific encoders. b. Two expert pathologists rated PixCell's synthetic images as having higher fidelity across all five qualitative criteria. c. Pathologist prediction accuracy for breast cancer subtyping (lobular vs ductal) on synthetic images (N=18 WSIs) is nearly identical to that on real images. Error bars denote 95% Confidence Interval (CI).
  • Figure 3: (a) The embeddings of PixCell's synthetic images are very similar to the embeddings of real images across multiple encoders which leads to (b) synthetic images preserving tissue semantics (c) Augmenting training data with synthetic images consistently improves the F1 scores of the three state-of-the-art downstream classifiers. Error bars represent $95\%$ CI.
  • Figure 4: Fréchet distance between the generated and real IHC tiles using different encoders. For pathology-specific encoders, we find that PixCell achieves the best scores.
  • Figure 5: Fréchet distance between the generated and real IHC tiles using different encoders. Previous methods benchmarked by Klockner et al. klockner2025gans perform worse than the USIGAN baseline. PixCell outperforms all previous methods when measuring distance with pathology-specific encoders.
  • ...and 15 more figures