Table of Contents
Fetching ...

GCUNet: A GNN-Based Contextual Learning Network for Tertiary Lymphoid Structure Semantic Segmentation in Whole Slide Image

Lei Su, Yang Du

TL;DR

This work tackles TLS semantic segmentation in whole-slide images by introducing GCUNet, a graph neural network–based framework that aggregates long-range, fine-grained context outside the target patch. A context graph is built over image patches and updated via multi-layer GCNs, with a DCFusion module semantically fusing contextual cues with detailed patch representations to produce accurate TLS segmentation masks. The authors release four TLS segmentation datasets (three public) and demonstrate GCUNet achieving at least a 7.41% improvement in mF1 over state-of-the-art methods, with strong gains on both four-class and three-class segmentation tasks. This approach enables maturation-aware TLS delineation in WSIs and offers a practical path toward more accurate, context-aware computational pathology workflows.

Abstract

We focus on tertiary lymphoid structure (TLS) semantic segmentation in whole slide image (WSI). Unlike TLS binary segmentation, TLS semantic segmentation identifies boundaries and maturity, which requires integrating contextual information to discover discriminative features. Due to the extensive scale of WSI (e.g., 100,000 \times 100,000 pixels), the segmentation of TLS is usually carried out through a patch-based strategy. However, this prevents the model from accessing information outside of the patches, limiting the performance. To address this issue, we propose GCUNet, a GNN-based contextual learning network for TLS semantic segmentation. Given an image patch (target) to be segmented, GCUNet first progressively aggregates long-range and fine-grained context outside the target. Then, a Detail and Context Fusion block (DCFusion) is designed to integrate the context and detail of the target to predict the segmentation mask. We build four TLS semantic segmentation datasets, called TCGA-COAD, TCGA-LUSC, TCGA-BLCA and INHOUSE-PAAD, and make the former three datasets (comprising 826 WSIs and 15,276 TLSs) publicly available to promote the TLS semantic segmentation. Experiments on these datasets demonstrate the superiority of GCUNet, achieving at least 7.41% improvement in mF1 compared with SOTA.

GCUNet: A GNN-Based Contextual Learning Network for Tertiary Lymphoid Structure Semantic Segmentation in Whole Slide Image

TL;DR

This work tackles TLS semantic segmentation in whole-slide images by introducing GCUNet, a graph neural network–based framework that aggregates long-range, fine-grained context outside the target patch. A context graph is built over image patches and updated via multi-layer GCNs, with a DCFusion module semantically fusing contextual cues with detailed patch representations to produce accurate TLS segmentation masks. The authors release four TLS segmentation datasets (three public) and demonstrate GCUNet achieving at least a 7.41% improvement in mF1 over state-of-the-art methods, with strong gains on both four-class and three-class segmentation tasks. This approach enables maturation-aware TLS delineation in WSIs and offers a practical path toward more accurate, context-aware computational pathology workflows.

Abstract

We focus on tertiary lymphoid structure (TLS) semantic segmentation in whole slide image (WSI). Unlike TLS binary segmentation, TLS semantic segmentation identifies boundaries and maturity, which requires integrating contextual information to discover discriminative features. Due to the extensive scale of WSI (e.g., 100,000 \times 100,000 pixels), the segmentation of TLS is usually carried out through a patch-based strategy. However, this prevents the model from accessing information outside of the patches, limiting the performance. To address this issue, we propose GCUNet, a GNN-based contextual learning network for TLS semantic segmentation. Given an image patch (target) to be segmented, GCUNet first progressively aggregates long-range and fine-grained context outside the target. Then, a Detail and Context Fusion block (DCFusion) is designed to integrate the context and detail of the target to predict the segmentation mask. We build four TLS semantic segmentation datasets, called TCGA-COAD, TCGA-LUSC, TCGA-BLCA and INHOUSE-PAAD, and make the former three datasets (comprising 826 WSIs and 15,276 TLSs) publicly available to promote the TLS semantic segmentation. Experiments on these datasets demonstrate the superiority of GCUNet, achieving at least 7.41% improvement in mF1 compared with SOTA.

Paper Structure

This paper contains 18 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: GCUNet gathers discriminative features by aggregating contextual information outside the target patch.
  • Figure 2: The architecture of the proposed GCUNet.
  • Figure 3: Visualization of segmentation results for three types of TLS—E-TLS, PET-TLS, and SEL-TLS. E-TLS are highlighted in red, PET-TLS in blue, and SEL-TLS in green according to our annotation guidelines. Each TLS contains a pair of images in two rows: the top row shows a global view, while the bottom row provides a detailed view of the highlighted region.
  • Figure 4: The impact of changing the number of GCN layers on four evaluation metrics.
  • Figure 5: The impact of changing the pixel spatial resolution.