Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

Jing Wei Tan; SeungKyu Kim; Eunsu Kim; Sung Hak Lee; Sangjeong Ahn; Won-Ki Jeong

Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

Jing Wei Tan, SeungKyu Kim, Eunsu Kim, Sung Hak Lee, Sangjeong Ahn, Won-Ki Jeong

TL;DR

The paper addresses the challenge of generating clinically valid pathology reports from large, multi-scale whole slide images (WSIs) by introducing PMPRG, a framework that combines a lightweight multi-scale regional vision transformer (MR-ViT) with a medical text foundation model. The approach uses unsupervised regional feature extraction to capture multi-scale WSI context and a tag-guided, organ-specific generation pipeline that produces structured, organ-aware reports, trained with real pathology reports. Key contributions include the MR-ViT encoder for efficient WSI representation, a PMPRG pipeline that yields patient-level, multi-organ reports with interpretable attention, and validated improvements on a kidney/colon dataset with a METEOR score around $0.68$. The work has practical implications for reducing pathologist workload and accelerating clinical reporting while maintaining interpretability, and sets the stage for extending to more magnifications and organs.

Abstract

Vision language models (VLM) have achieved success in both natural language comprehension and image recognition tasks. However, their use in pathology report generation for whole slide images (WSIs) is still limited due to the huge size of multi-scale WSIs and the high cost of WSI annotation. Moreover, in most of the existing research on pathology report generation, sufficient validation regarding clinical efficacy has not been conducted. Herein, we propose a novel Patient-level Multi-organ Pathology Report Generation (PMPRG) model, which utilizes the multi-scale WSI features from our proposed multi-scale regional vision transformer (MR-ViT) model and their real pathology reports to guide VLM training for accurate pathology report generation. The model then automatically generates a report based on the provided key features attended regional features. We assessed our model using a WSI dataset consisting of multiple organs, including the colon and kidney. Our model achieved a METEOR score of 0.68, demonstrating the effectiveness of our approach. This model allows pathologists to efficiently generate pathology reports for patients, regardless of the number of WSIs involved.

Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

TL;DR

. The work has practical implications for reducing pathologist workload and accelerating clinical reporting while maintaining interpretability, and sets the stage for extending to more magnifications and organs.

Abstract

Paper Structure (12 sections, 6 equations, 2 figures, 2 tables)

This paper contains 12 sections, 6 equations, 2 figures, 2 tables.

Introduction
Method
Overview and Pre-processing
Multi-scale Regional Vision Transformer (MR-ViT)
Pathology Report Generation
Result
Experimental Result
Image Encoder.
PMPRG.
Conclusion
Acknowledgements.
Disclosure of Interests.

Figures (2)

Figure 1: Overview of our proposed pipeline: (A)PMPRG, (B)Multi-scale WSI, (C)MR-ViT, (D)Tag-guided feature extractor and (E)GPT-2.
Figure 2: (a) The attention map depicts the importance of WSI regions for specific tags, where brighter regions indicate higher importance. (b) Examples of reports generated using our method and baseline methods.

Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

TL;DR

Abstract

Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

Authors

TL;DR

Abstract

Table of Contents

Figures (2)