Table of Contents
Fetching ...

Diagnostic Text-guided Representation Learning in Hierarchical Classification for Pathological Whole Slide Image

Jiawen Li, Qiehe Sun, Renao Yan, Yizhi Wang, Yuqiu Fu, Yani Wei, Tian Guan, Huijuan Shi, Yonghonghe He, Anjia Han

TL;DR

The concept of hierarchical pathological image classification is introduced and a representation learning called PathTree is proposed that is consistently competitive compared to the state-of-the-art methods and provides a new perspective on the deep learning-assisted solution for more complex WSI classification.

Abstract

With the development of digital imaging in medical microscopy, artificial intelligent-based analysis of pathological whole slide images (WSIs) provides a powerful tool for cancer diagnosis. Limited by the expensive cost of pixel-level annotation, current research primarily focuses on representation learning with slide-level labels, showing success in various downstream tasks. However, given the diversity of lesion types and the complex relationships between each other, these techniques still deserve further exploration in addressing advanced pathology tasks. To this end, we introduce the concept of hierarchical pathological image classification and propose a representation learning called PathTree. PathTree considers the multi-classification of diseases as a binary tree structure. Each category is represented as a professional pathological text description, which messages information with a tree-like encoder. The interactive text features are then used to guide the aggregation of hierarchical multiple representations. PathTree uses slide-text similarity to obtain probability scores and introduces two extra tree specific losses to further constrain the association between texts and slides. Through extensive experiments on three challenging hierarchical classification datasets: in-house cryosectioned lung tissue lesion identification, public prostate cancer grade assessment, and public breast cancer subtyping, our proposed PathTree is consistently competitive compared to the state-of-the-art methods and provides a new perspective on the deep learning-assisted solution for more complex WSI classification.

Diagnostic Text-guided Representation Learning in Hierarchical Classification for Pathological Whole Slide Image

TL;DR

The concept of hierarchical pathological image classification is introduced and a representation learning called PathTree is proposed that is consistently competitive compared to the state-of-the-art methods and provides a new perspective on the deep learning-assisted solution for more complex WSI classification.

Abstract

With the development of digital imaging in medical microscopy, artificial intelligent-based analysis of pathological whole slide images (WSIs) provides a powerful tool for cancer diagnosis. Limited by the expensive cost of pixel-level annotation, current research primarily focuses on representation learning with slide-level labels, showing success in various downstream tasks. However, given the diversity of lesion types and the complex relationships between each other, these techniques still deserve further exploration in addressing advanced pathology tasks. To this end, we introduce the concept of hierarchical pathological image classification and propose a representation learning called PathTree. PathTree considers the multi-classification of diseases as a binary tree structure. Each category is represented as a professional pathological text description, which messages information with a tree-like encoder. The interactive text features are then used to guide the aggregation of hierarchical multiple representations. PathTree uses slide-text similarity to obtain probability scores and introduces two extra tree specific losses to further constrain the association between texts and slides. Through extensive experiments on three challenging hierarchical classification datasets: in-house cryosectioned lung tissue lesion identification, public prostate cancer grade assessment, and public breast cancer subtyping, our proposed PathTree is consistently competitive compared to the state-of-the-art methods and provides a new perspective on the deep learning-assisted solution for more complex WSI classification.

Paper Structure

This paper contains 27 sections, 15 equations, 13 figures, 7 tables, 2 algorithms.

Figures (13)

  • Figure 1: The existing WSI multi-classification method is described as a planar classification problem, which treats each category independently and equally. However, the interrelations among various categories exhibit considerable complexity in real-world pathological contexts. Pathologists typically analyze from coarse-grained to fine-grained levels, following a hierarchical, tree-like relationship among classes. This approach enhances the efficiency and precision of diagnostics and helps pathologists systematically understand and interpret complex pathological information, making more accurate medical decisions.
  • Figure 2: Overview of PathTree. The main idea is to convert challenging pathological multi-classification into hierarchical tree structures for analysis. PathTree uses the WSI to generate multiple slide-level embeddings from patch-level features, allowing them to contrast text semantics. After aggregation based on the tree path, the relationship between tree-like text semantics is measured by two structure-specific losses, and prediction scores are obtained by calculating the cosine similarity between slide and fine-grained text embeddings.
  • Figure 3: Three different text forms of pathological lung tissue categories. (a) Planar text labels, using only fine-grained category names; (b) hierarchical text labels, using both fine-grained and coarse-grained category names; (c) hierarchical descriptive text labels, with each fine-grained and coarse-grained category described in detail using pathology terminology.
  • Figure 4: Schematic diagram of the two attention modules. (a) Multiple gated attention assigns multiple attention scores to each patch, which are then weighted with the patch embedding to obtain multiple slide representations; (b) multi-head Nystrom assigns $2N-1$ heads, then uses the linear Nystrom method to update the embedding in each head, and finally obtains multiple slide representations through average pooling layer.
  • Figure 5: Graphical demonstration of path alignment learning. Its goal is to make the slide-level embedding of the target node close to all text embeddings in its path to the root node.
  • ...and 8 more figures