Table of Contents
Fetching ...

Towards a text-based quantitative and explainable histopathology image analysis

Anh Tien Nguyen, Trinh Thi Le Vuong, Jin Tae Kwak

TL;DR

The paper introduces TQx, a text-based quantitative histopathology analysis framework that leverages a pre-trained vision-language model to perform image-to-text retrieval. By constructing a word-of-interest pool from large pathology text datasets and UMLS terms, TQx generates text-based embeddings through a weighted combination of keyword text representations, yielding inherently interpretable features. Across four public histopathology datasets, TQx achieves competitive clustering quality and classification performance, with improvements observed when using more specific semantic pools, thereby enabling quantitative analysis alongside human-readable explanations. This approach offers a self-explanatory pathway to quantify and interpret histopathology images without extensive post-processing, with potential for broader downstream tasks after WoI pool optimization.

Abstract

Recently, vision-language pre-trained models have emerged in computational pathology. Previous works generally focused on the alignment of image-text pairs via the contrastive pre-training paradigm. Such pre-trained models have been applied to pathology image classification in zero-shot learning or transfer learning fashion. Herein, we hypothesize that the pre-trained vision-language models can be utilized for quantitative histopathology image analysis through a simple image-to-text retrieval. To this end, we propose a Text-based Quantitative and Explainable histopathology image analysis, which we call TQx. Given a set of histopathology images, we adopt a pre-trained vision-language model to retrieve a word-of-interest pool. The retrieved words are then used to quantify the histopathology images and generate understandable feature embeddings due to the direct mapping to the text description. To evaluate the proposed method, the text-based embeddings of four histopathology image datasets are utilized to perform clustering and classification tasks. The results demonstrate that TQx is able to quantify and analyze histopathology images that are comparable to the prevalent visual models in computational pathology.

Towards a text-based quantitative and explainable histopathology image analysis

TL;DR

The paper introduces TQx, a text-based quantitative histopathology analysis framework that leverages a pre-trained vision-language model to perform image-to-text retrieval. By constructing a word-of-interest pool from large pathology text datasets and UMLS terms, TQx generates text-based embeddings through a weighted combination of keyword text representations, yielding inherently interpretable features. Across four public histopathology datasets, TQx achieves competitive clustering quality and classification performance, with improvements observed when using more specific semantic pools, thereby enabling quantitative analysis alongside human-readable explanations. This approach offers a self-explanatory pathway to quantify and interpret histopathology images without extensive post-processing, with potential for broader downstream tasks after WoI pool optimization.

Abstract

Recently, vision-language pre-trained models have emerged in computational pathology. Previous works generally focused on the alignment of image-text pairs via the contrastive pre-training paradigm. Such pre-trained models have been applied to pathology image classification in zero-shot learning or transfer learning fashion. Herein, we hypothesize that the pre-trained vision-language models can be utilized for quantitative histopathology image analysis through a simple image-to-text retrieval. To this end, we propose a Text-based Quantitative and Explainable histopathology image analysis, which we call TQx. Given a set of histopathology images, we adopt a pre-trained vision-language model to retrieve a word-of-interest pool. The retrieved words are then used to quantify the histopathology images and generate understandable feature embeddings due to the direct mapping to the text description. To evaluate the proposed method, the text-based embeddings of four histopathology image datasets are utilized to perform clustering and classification tasks. The results demonstrate that TQx is able to quantify and analyze histopathology images that are comparable to the prevalent visual models in computational pathology.
Paper Structure (15 sections, 3 figures, 2 tables)

This paper contains 15 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The raw WoI pool stores all UMLS umls pathology terms of various semantic types. The filtered pool is obtained by selecting a particular semantic type under consideration. The pair of encoders from a pre-trained VLM generates text and visual embeddings, which are then compared together. The similarity scores from the comparison are normalized and then used as weights to produce a text-based embedding.
  • Figure 2: Clustering results with $\mathcal{W}_{Level-3}^{M}$ (Neoplastic Process) of (a) visual embeddings and (b) text-based image embeddings. In the Ground Truth plots, samples are re-assigned to the clusters using the ground truth class labels. The bottom numbers show silhouette coefficients measuring how similar an embedding is to its own cluster.
  • Figure 3: The bar plots show the percentage of samples per class in each cluster, based on the clustering with $\mathcal{W}_{Level-3}^{M}$ (Neoplastic Process). Five keywords with the highest average ranks are shown next to the corresponding bar plot.