Table of Contents
Fetching ...

A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention

João D. Nunes, Diana Montezuma, Domingos Oliveira, Tania Pereira, Jaime S. Cardoso

TL;DR

This paper surveys the role of context and attention in cell nuclei instance segmentation and classification from HE-stained brightfield microscopy, arguing that domain-rich context could guide more robust models. It analyzes a broad set of building blocks (SE/CBAM, AFPN, relation networks, transformers) and their medical-imaging applications, with a detailed case study extending Mask-RCNN and HoVer-Net to incorporate context modules. The review highlights that context/attention can improve performance in semantic and dense-prediction tasks, but gains for nucleus instance segmentation/classification are variable and highly architecture-dependent, underscoring the need for domain-tailored designs and explicit context annotations. It also outlines future directions, including causal representation learning, graph-based reasoning, multimodal data integration, and annotation-efficient learning to advance clinically applicable nuclear analysis. Overall, the work emphasizes careful experimental design and domain knowledge incorporation to translate context-aware methods into reliable, clinically useful tools in computational pathology.

Abstract

Manually annotating nuclei from the gigapixel Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features. But due to high intra- and inter-class variability of nuclei morphological and chromatic features, as well as H&E-stains susceptibility to artefacts, state-of-the-art algorithms cannot correctly detect and classify instances with the necessary performance. In this work, we hypothesise context and attention inductive biases in artificial neural networks (ANNs) could increase the generalization of algorithms for cell nuclei instance segmentation and classification. We conduct a thorough survey on context and attention methods for cell nuclei instance segmentation and classification from H&E-stained microscopy imaging, while providing a comprehensive discussion of the challenges being tackled with context and attention. Besides, we illustrate some limitations of current approaches and present ideas for future research. As a case study, we extend both a general instance segmentation and classification method (Mask-RCNN) and a tailored cell nuclei instance segmentation and classification model (HoVer-Net) with context- and attention-based mechanisms, and do a comparative analysis on a multi-centre colon nuclei identification and counting dataset. Although pathologists rely on context at multiple levels while paying attention to specific Regions of Interest (RoIs) when analysing and annotating WSIs, our findings suggest translating that domain knowledge into algorithm design is no trivial task, but to fully exploit these mechanisms, the scientific understanding of these methods should be addressed.

A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention

TL;DR

This paper surveys the role of context and attention in cell nuclei instance segmentation and classification from HE-stained brightfield microscopy, arguing that domain-rich context could guide more robust models. It analyzes a broad set of building blocks (SE/CBAM, AFPN, relation networks, transformers) and their medical-imaging applications, with a detailed case study extending Mask-RCNN and HoVer-Net to incorporate context modules. The review highlights that context/attention can improve performance in semantic and dense-prediction tasks, but gains for nucleus instance segmentation/classification are variable and highly architecture-dependent, underscoring the need for domain-tailored designs and explicit context annotations. It also outlines future directions, including causal representation learning, graph-based reasoning, multimodal data integration, and annotation-efficient learning to advance clinically applicable nuclear analysis. Overall, the work emphasizes careful experimental design and domain knowledge incorporation to translate context-aware methods into reliable, clinically useful tools in computational pathology.

Abstract

Manually annotating nuclei from the gigapixel Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features. But due to high intra- and inter-class variability of nuclei morphological and chromatic features, as well as H&E-stains susceptibility to artefacts, state-of-the-art algorithms cannot correctly detect and classify instances with the necessary performance. In this work, we hypothesise context and attention inductive biases in artificial neural networks (ANNs) could increase the generalization of algorithms for cell nuclei instance segmentation and classification. We conduct a thorough survey on context and attention methods for cell nuclei instance segmentation and classification from H&E-stained microscopy imaging, while providing a comprehensive discussion of the challenges being tackled with context and attention. Besides, we illustrate some limitations of current approaches and present ideas for future research. As a case study, we extend both a general instance segmentation and classification method (Mask-RCNN) and a tailored cell nuclei instance segmentation and classification model (HoVer-Net) with context- and attention-based mechanisms, and do a comparative analysis on a multi-centre colon nuclei identification and counting dataset. Although pathologists rely on context at multiple levels while paying attention to specific Regions of Interest (RoIs) when analysing and annotating WSIs, our findings suggest translating that domain knowledge into algorithm design is no trivial task, but to fully exploit these mechanisms, the scientific understanding of these methods should be addressed.
Paper Structure (39 sections, 15 equations, 15 figures, 11 tables)

This paper contains 39 sections, 15 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: Example of a WSI tile from the Lizard dataset (CRC) Graham2019a. Left: Original tile. Right: Cell nuclei overlay. We observe that digital pathology images have rich contexts, where cells organize by type in cellular communities with relatively well-defined geometries and orientations. For instance, we observe that nuclei from epithelial cells (green) are organized in an oval geometry and that all other cells are located outwards to these, and orientated in different directions.
  • Figure 2: A structured way of defining context in computer vision.
  • Figure 3: Overview of CBAM. The module considers both channel, and spatial attention to refine feature maps. The spatial attention module applies $7 \times 7$ convolutions over features pooled along the channel dimension, whereas the channel attention module applies a shared MLP to average and max-pooled features along the spatial domain. Reproduced from Woo2018, Doi: 10.1007/978-3-030-01234-2_1
  • Figure 4: Top: Overview of an Object Relation Network. Middle: Object Relation Module: Bottom: Duplicate Removal Module. Reproduced from Hu2018, Doi: 10.1109/CVPR.2018.00378
  • Figure 5: Overview of the DETR. Reproduced from Carion2020, Doi: 10.1007/978-3-030-58452-8_13
  • ...and 10 more figures