Table of Contents
Fetching ...

Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation

Ashkan Shakarami, Azade Farshad, Yousef Yeganeh, Lorenzo Nicole, Peter Schüffler, Stefano Ghidoni, Nassir Navab

TL;DR

This paper tackles the annotation and computational bottlenecks of pixel-wise histopathology segmentation by introducing a unit-based framework (UTS) that classifies fixed-size tiles of $32 \times 32$ pixels. Central to UTS is L-ViT, a Multi-Level Vision Transformer with an EfficientNetB3 backbone, MLFF, and attention modules (DAT-SE, D-CBAM) that captures both local morphology and global tissue context. The approach demonstrates superior performance on a large, tile-based breast tissue dataset, outperforming CNN baselines and state-of-the-art pixel-wise models in DSC and IoU while offering substantial efficiency gains. A refinement stage (Neighborhood-Based Smoothing and Class Discretization) enhances boundary coherence and interpretability, supporting clinical workflow through WSI overlays and quantitative tissue composition analysis. The work suggests unit-based segmentation as a scalable, annotation-efficient paradigm for digital pathology with practical impact for tumor quantification and surgical margin assessment.

Abstract

We propose UTS, a unit-based tissue segmentation framework for histopathology that classifies each fixed-size 32 * 32 tile, rather than each pixel, as the segmentation unit. This approach reduces annotation effort and improves computational efficiency without compromising accuracy. To implement this approach, we introduce a Multi-Level Vision Transformer (L-ViT), which benefits the multi-level feature representation to capture both fine-grained morphology and global tissue context. Trained to segment breast tissue into three categories (infiltrating tumor, non-neoplastic stroma, and fat), UTS supports clinically relevant tasks such as tumor-stroma quantification and surgical margin assessment. Evaluated on 386,371 tiles from 459 H&E-stained regions, it outperforms U-Net variants and transformer-based baselines. Code and Dataset will be available at GitHub.

Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation

TL;DR

This paper tackles the annotation and computational bottlenecks of pixel-wise histopathology segmentation by introducing a unit-based framework (UTS) that classifies fixed-size tiles of pixels. Central to UTS is L-ViT, a Multi-Level Vision Transformer with an EfficientNetB3 backbone, MLFF, and attention modules (DAT-SE, D-CBAM) that captures both local morphology and global tissue context. The approach demonstrates superior performance on a large, tile-based breast tissue dataset, outperforming CNN baselines and state-of-the-art pixel-wise models in DSC and IoU while offering substantial efficiency gains. A refinement stage (Neighborhood-Based Smoothing and Class Discretization) enhances boundary coherence and interpretability, supporting clinical workflow through WSI overlays and quantitative tissue composition analysis. The work suggests unit-based segmentation as a scalable, annotation-efficient paradigm for digital pathology with practical impact for tumor quantification and surgical margin assessment.

Abstract

We propose UTS, a unit-based tissue segmentation framework for histopathology that classifies each fixed-size 32 * 32 tile, rather than each pixel, as the segmentation unit. This approach reduces annotation effort and improves computational efficiency without compromising accuracy. To implement this approach, we introduce a Multi-Level Vision Transformer (L-ViT), which benefits the multi-level feature representation to capture both fine-grained morphology and global tissue context. Trained to segment breast tissue into three categories (infiltrating tumor, non-neoplastic stroma, and fat), UTS supports clinically relevant tasks such as tumor-stroma quantification and surgical margin assessment. Evaluated on 386,371 tiles from 459 H&E-stained regions, it outperforms U-Net variants and transformer-based baselines. Code and Dataset will be available at GitHub.

Paper Structure

This paper contains 33 sections, 14 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: UTSPipeline. (a) Tissue extraction via biopsy or surgery. (b) H&E-stained WSI acquisition through processing, staining, and digitization. (c) SlideTiler preprocessing for WSI or ROI selection and standardized tiling into $32 \times 32$ pixel tiles. (d) Segmentation using L-ViT to classify tiles into Infiltrating Breast Tumor, Fat Tissue, and Non-neoplastic Stroma, visualized with color-coded overlays. The system also computes tissue composition ratios (e.g., Tumor: 51.71%, Stroma: 12.22%, Fat: 36.07%), enabling automated tumor-stroma quantification based on unit-level classification using fixed-size $32 \times 32$ tiles. This integrated analysis supports Tumor–Stroma Ratio (TSR) estimation, providing interpretable metrics for prognostic assessment and personalized treatment planning.
  • Figure 2:
  • Figure 3: DAT-SE (\ref{['subsec: DAT-SE']}): Enhances feature representation by recalibrating input maps with channel and spatial attention hu2018.
  • Figure 4: D-CBAM (\ref{['subsec: D-CBAM']}): Extends CBAM Alirezazadeh2023woo2018 by integrating channel and spatial attention for improved feature focus.
  • Figure 5: VTM Architecture (\ref{['subsec: VTM']}), utilizing Transformer blocks dosovitskiy2020 with Multi-Head Self-Attention, Feed-Forward Networks, and Layer Normalization.
  • ...and 1 more figures