Table of Contents
Fetching ...

LEMON: a foundation model for nuclear morphology in Computational Pathology

Loïc Chadoutaud, Alice Blondel, Hana Feki, Jacqueline Fontugne, Emmanuel Barillot, Thomas Walter

Abstract

Computational pathology relies on effective representation learning to support cancer research and precision medicine. Although self-supervised learning has driven major progress at the patch and whole-slide image levels, representation learning at the single-cell level remains comparatively underexplored, despite its importance for characterizing cell types and cellular phenotypes. We introduce LEMON (Learning Embeddings from Morphology Of Nuclei), a self-supervised foundation model for scalable single-cell image representation learning. Trained on millions of cell images from diverse tissues and cancer types, LEMON learns robust and versatile morphological representations that support large-scale single-cell analyses in pathology. We evaluate LEMON on five benchmark datasets across a range of prediction tasks and show that it provides strong performance, highlighting its potential as a new paradigm for cell-level computational pathology. Model weights are available at https://huggingface.co/aliceblondel/LEMON.

LEMON: a foundation model for nuclear morphology in Computational Pathology

Abstract

Computational pathology relies on effective representation learning to support cancer research and precision medicine. Although self-supervised learning has driven major progress at the patch and whole-slide image levels, representation learning at the single-cell level remains comparatively underexplored, despite its importance for characterizing cell types and cellular phenotypes. We introduce LEMON (Learning Embeddings from Morphology Of Nuclei), a self-supervised foundation model for scalable single-cell image representation learning. Trained on millions of cell images from diverse tissues and cancer types, LEMON learns robust and versatile morphological representations that support large-scale single-cell analyses in pathology. We evaluate LEMON on five benchmark datasets across a range of prediction tasks and show that it provides strong performance, highlighting its potential as a new paradigm for cell-level computational pathology. Model weights are available at https://huggingface.co/aliceblondel/LEMON.

Paper Structure

This paper contains 32 sections, 5 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Overview of the LEMON framework.
  • Figure 2: Performance of pretraining strategies versus flops. Left: mean balanced accuracy across NuCLS (super, main, raw), MIDOG25 and PanNuke for classification. Right: mean pcc across breast, lung, and bowel datasets for regression. Marker size indicates model parameter count; color indicates pretraining category.
  • Figure 3: Performance of models vs. training data composition. Mean balanced accuracy (error bars represents standard errors) across MIDOG25 and the three NuCLS classification tasks. Top: Performance by training dataset size; models improve with larger datasets and level off near 1M images. Bottom: With the total number of images held constant, models perform better as diversity increases (more organs of origin and more slides).
  • Figure 4: Alignment between morphological embeddings and marker-gene expression in a bowel Xenium tissue section (HEST id: TENX147). Left: t-SNE projection of LEMON morphological embeddings. Right: normalized expression maps for marker genes, showing their spatial distribution within the morphology-derived manifold.
  • Figure S1: Examples of nuclei images with different augmentation strategies with three random augmentations from either the MoCo v3 augmentation or ours
  • ...and 3 more figures