Table of Contents
Fetching ...

UniCell: Universal Cell Nucleus Classification via Prompt Learning

Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li

TL;DR

UniCell tackles cross-dataset nucleus classification by learning a single universal model capable of detecting and classifying nuclei across diverse pathology datasets with inconsistent annotations. It blends a DETR-based end-to-end architecture with a Dynamic Prompt Module that injects dataset- and category-level semantics through dataset prompts and a Category Memory Bank, enabling cross-domain knowledge sharing across $D$ datasets and $C$ categories. A Contrastive DeNoising Query mechanism accelerates training by using noisy centroids to generate queries during training, while inference relies on learnable content queries, and per-dataset prediction heads manage differing label sets. Empirical results on four benchmarks show state-of-the-art performance in both detection and classification, with ablations confirming the effectiveness of DPM, the optimal local attention depth ($L=3$), and the superiority of the Feature-Enhancing strategy for feature refinement. This approach reduces data fragmentation across datasets and offers a scalable path toward practical, cross-domain histopathology analysis.

Abstract

The recognition of multi-class cell nuclei can significantly facilitate the process of histopathological diagnosis. Numerous pathological datasets are currently available, but their annotations are inconsistent. Most existing methods require individual training on each dataset to deduce the relevant labels and lack the use of common knowledge across datasets, consequently restricting the quality of recognition. In this paper, we propose a universal cell nucleus classification framework (UniCell), which employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains. In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets. Moreover, we develop a Dynamic Prompt Module (DPM) that exploits the properties of multiple datasets to enhance features. The DPM first integrates the embeddings of datasets and semantic categories, and then employs the integrated prompts to refine image representations, efficiently harvesting the shared knowledge among the related cell types and data sources. Experimental results demonstrate that the proposed method effectively achieves the state-of-the-art results on four nucleus detection and classification benchmarks. Code and models are available at https://github.com/lhaof/UniCell

UniCell: Universal Cell Nucleus Classification via Prompt Learning

TL;DR

UniCell tackles cross-dataset nucleus classification by learning a single universal model capable of detecting and classifying nuclei across diverse pathology datasets with inconsistent annotations. It blends a DETR-based end-to-end architecture with a Dynamic Prompt Module that injects dataset- and category-level semantics through dataset prompts and a Category Memory Bank, enabling cross-domain knowledge sharing across datasets and categories. A Contrastive DeNoising Query mechanism accelerates training by using noisy centroids to generate queries during training, while inference relies on learnable content queries, and per-dataset prediction heads manage differing label sets. Empirical results on four benchmarks show state-of-the-art performance in both detection and classification, with ablations confirming the effectiveness of DPM, the optimal local attention depth (), and the superiority of the Feature-Enhancing strategy for feature refinement. This approach reduces data fragmentation across datasets and offers a scalable path toward practical, cross-domain histopathology analysis.

Abstract

The recognition of multi-class cell nuclei can significantly facilitate the process of histopathological diagnosis. Numerous pathological datasets are currently available, but their annotations are inconsistent. Most existing methods require individual training on each dataset to deduce the relevant labels and lack the use of common knowledge across datasets, consequently restricting the quality of recognition. In this paper, we propose a universal cell nucleus classification framework (UniCell), which employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains. In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets. Moreover, we develop a Dynamic Prompt Module (DPM) that exploits the properties of multiple datasets to enhance features. The DPM first integrates the embeddings of datasets and semantic categories, and then employs the integrated prompts to refine image representations, efficiently harvesting the shared knowledge among the related cell types and data sources. Experimental results demonstrate that the proposed method effectively achieves the state-of-the-art results on four nucleus detection and classification benchmarks. Code and models are available at https://github.com/lhaof/UniCell
Paper Structure (14 sections, 7 equations, 6 figures, 2 tables)

This paper contains 14 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The illustration of universal multi-dataset cell nucleus classification. The lack of a unified standard for annotating nucleus types hinders the efficient utilization of data and labels. For example, The Lizard dataset (Blue) has three overlapping classes (Neu., Epi., Lym.) with the MoNuSAC dataset (Green). Besides, there exists an inclusion relationship between some categories, such as connective and stromal cells. Our approach utilizes multiple datasets and their associated labels as prompts for training a unified model.
  • Figure 2: The framework of our proposed UniCell. The multi-scale feature and Dataset ID are inputted into the Dynamic Prompt Module for the specific dataset prompt. Note that both the deep supervision and query refinement in the decoder layer are omitted for readability.
  • Figure 3: The proposed Dynamic Prompt Module. Both dataset prompt and category memory bank are tokenized and embedded from priori textual sequences. We update the dataset prompts with the embeddings in the category memory bank, and adopt the updated prompts to enhance input representations.
  • Figure 4: Qualitative comparison on the Lizard dataset. Five types of cells are marked with dilated nucleus centroids in five different colors. As the results show, the category distribution of our method is the closest to that of the ground truths.
  • Figure 5: Different ways of using multiple datasets. 'UniCell*' refers that UniCell with one (instead of four) head is trained independently on each dataset after removing DPM. 'UniCell*(BP)' denotes training 'UniCell*' from the weights pre-trained on a binary detection & classification task. 'UniCell$\dagger$' uses the training data of four sources to train a UniCell* model by merging the categories of all datasets.
  • ...and 1 more figures