Table of Contents
Fetching ...

CytoFM: The first cytology foundation model

Vedrana Ivezić, Ashwath Radhachandran, Ekaterina Redekop, Shreeram Athreya, Dongwoo Lee, Vivek Sant, Corey Arnold, William Speier

TL;DR

CytoFM introduces the first cytology-specific self-supervised foundation model trained with iBOT on a diverse, multi-institutional corpus of ~1.4 million patches across breast, cervix, and thyroid. Using a frozen ViT feature extractor combined with ABMIL, the approach evaluates on three downstream tasks and shows improved performance over non-cytology baselines on two tasks, with strong evidence of cytology-relevant representation through attention and embedding visualizations. The work demonstrates robust generalization to unseen data and highlights the potential of cytology-focused foundation models to provide domain-specific, transferable features for diagnostic support without task-specific fine-tuning.

Abstract

Cytology is essential for cancer diagnostics and screening due to its minimally invasive nature. However, the development of robust deep learning models for digital cytology is challenging due to the heterogeneity in staining and preparation methods of samples, differences across organs, and the limited availability of large, diverse, annotated datasets. Developing a task-specific model for every cytology application is impractical and non-cytology-specific foundation models struggle to generalize to tasks in this domain where the emphasis is on cell morphology. To address these challenges, we introduce CytoFM, the first cytology self-supervised foundation model. Using iBOT, a self-supervised Vision Transformer (ViT) training framework incorporating masked image modeling and self-distillation, we pretrain CytoFM on a diverse collection of cytology datasets to learn robust, transferable representations. We evaluate CytoFM on multiple downstream cytology tasks, including breast cancer classification and cell type identification, using an attention-based multiple instance learning framework. Our results demonstrate that CytoFM performs better on two out of three downstream tasks than existing foundation models pretrained on histopathology (UNI) or natural images (iBOT-Imagenet). Visualizations of learned representations demonstrate our model is able to attend to cytologically relevant features. Despite a small pre-training dataset, CytoFM's promising results highlight the ability of task-agnostic pre-training approaches to learn robust and generalizable features from cytology data.

CytoFM: The first cytology foundation model

TL;DR

CytoFM introduces the first cytology-specific self-supervised foundation model trained with iBOT on a diverse, multi-institutional corpus of ~1.4 million patches across breast, cervix, and thyroid. Using a frozen ViT feature extractor combined with ABMIL, the approach evaluates on three downstream tasks and shows improved performance over non-cytology baselines on two tasks, with strong evidence of cytology-relevant representation through attention and embedding visualizations. The work demonstrates robust generalization to unseen data and highlights the potential of cytology-focused foundation models to provide domain-specific, transferable features for diagnostic support without task-specific fine-tuning.

Abstract

Cytology is essential for cancer diagnostics and screening due to its minimally invasive nature. However, the development of robust deep learning models for digital cytology is challenging due to the heterogeneity in staining and preparation methods of samples, differences across organs, and the limited availability of large, diverse, annotated datasets. Developing a task-specific model for every cytology application is impractical and non-cytology-specific foundation models struggle to generalize to tasks in this domain where the emphasis is on cell morphology. To address these challenges, we introduce CytoFM, the first cytology self-supervised foundation model. Using iBOT, a self-supervised Vision Transformer (ViT) training framework incorporating masked image modeling and self-distillation, we pretrain CytoFM on a diverse collection of cytology datasets to learn robust, transferable representations. We evaluate CytoFM on multiple downstream cytology tasks, including breast cancer classification and cell type identification, using an attention-based multiple instance learning framework. Our results demonstrate that CytoFM performs better on two out of three downstream tasks than existing foundation models pretrained on histopathology (UNI) or natural images (iBOT-Imagenet). Visualizations of learned representations demonstrate our model is able to attend to cytologically relevant features. Despite a small pre-training dataset, CytoFM's promising results highlight the ability of task-agnostic pre-training approaches to learn robust and generalizable features from cytology data.

Paper Structure

This paper contains 10 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: CytoFM: the first cytology specific foundation model. Developing this model requires the curation of an unlabeled cytology specific dataset which is used to pre-train a ViT, using the iBOT framework (a). The trained ViT, CytoFM, is then used to extract features for cytology image patches; the patch features for an image are aggregated into a bag and an ABMIL framework creates a single embedding for an image to be used in downstream tasks (b).
  • Figure 2: Filtering of patches from a WSI for our private thyroid dataset. A WSI is patched and ThyVGG is used to predict the probability that the patch contains relevant information. The top 1500 patches with the highest probability from a slide are used in the thyroid cytology dataset.
  • Figure 3: CytoFM attentions. Selected attention maps from the last layer of the model. The model attends to relevant cytological features such as the the nuclei, mitotic activity, nuclei boundaries, and morphology.
  • Figure 4: UMAP of extracted features for the MLBC (top row) and FNAC2019 (bottom row) datasets. Less dispersion in the features is seen in our model, CytoFM.