Table of Contents
Fetching ...

VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining

Zizhi Chen, Xinyu Zhang, Minghao Han, Yizhou Liu, Ziyun Qian, Weifeng Zhang, Xukun Zhang, Jingwei Wei, Lihua Zhang

TL;DR

This work addresses the challenge of virtually staining histopathology while preserving cytological structure and accounting for staining physics. It introduces a pathology-aware VLM-assisted framework (VPGAN) guided by three prompt modules—contrastive prompts, constant concept anchoring, and independent concept reinforcement—plus an inference-enhancement system (HARBOR) that uses DDIM and multi-level calibration to prevent staining-domain collapse. By leveraging a pathology-specific vision-language model, the authors achieve state-of-the-art realism and improved downstream glomerular detection and segmentation on unpaired staining datasets, with ablations confirming the value of each module. The approach supports data augmentation and holds practical potential for reducing staining costs and improving diagnostic workflows in pathology.

Abstract

In histopathology, tissue sections are typically stained using common H&E staining or special stains (MAS, PAS, PASM, etc.) to clearly visualize specific tissue structures. The rapid advancement of deep learning offers an effective solution for generating virtually stained images, significantly reducing the time and labor costs associated with traditional histochemical staining. However, a new challenge arises in separating the fundamental visual characteristics of tissue sections from the visual differences induced by staining agents. Additionally, virtual staining often overlooks essential pathological knowledge and the physical properties of staining, resulting in only style-level transfer. To address these issues, we introduce, for the first time in virtual staining tasks, a pathological vision-language large model (VLM) as an auxiliary tool. We integrate contrastive learnable prompts, foundational concept anchors for tissue sections, and staining-specific concept anchors to leverage the extensive knowledge of the pathological VLM. This approach is designed to describe, frame, and enhance the direction of virtual staining. Furthermore, we have developed a data augmentation method based on the constraints of the VLM. This method utilizes the VLM's powerful image interpretation capabilities to further integrate image style and structural information, proving beneficial in high-precision pathological diagnostics. Extensive evaluations on publicly available multi-domain unpaired staining datasets demonstrate that our method can generate highly realistic images and enhance the accuracy of downstream tasks, such as glomerular detection and segmentation. Our code is available at: https://github.com/CZZZZZZZZZZZZZZZZZ/VPGAN-HARBOR

VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining

TL;DR

This work addresses the challenge of virtually staining histopathology while preserving cytological structure and accounting for staining physics. It introduces a pathology-aware VLM-assisted framework (VPGAN) guided by three prompt modules—contrastive prompts, constant concept anchoring, and independent concept reinforcement—plus an inference-enhancement system (HARBOR) that uses DDIM and multi-level calibration to prevent staining-domain collapse. By leveraging a pathology-specific vision-language model, the authors achieve state-of-the-art realism and improved downstream glomerular detection and segmentation on unpaired staining datasets, with ablations confirming the value of each module. The approach supports data augmentation and holds practical potential for reducing staining costs and improving diagnostic workflows in pathology.

Abstract

In histopathology, tissue sections are typically stained using common H&E staining or special stains (MAS, PAS, PASM, etc.) to clearly visualize specific tissue structures. The rapid advancement of deep learning offers an effective solution for generating virtually stained images, significantly reducing the time and labor costs associated with traditional histochemical staining. However, a new challenge arises in separating the fundamental visual characteristics of tissue sections from the visual differences induced by staining agents. Additionally, virtual staining often overlooks essential pathological knowledge and the physical properties of staining, resulting in only style-level transfer. To address these issues, we introduce, for the first time in virtual staining tasks, a pathological vision-language large model (VLM) as an auxiliary tool. We integrate contrastive learnable prompts, foundational concept anchors for tissue sections, and staining-specific concept anchors to leverage the extensive knowledge of the pathological VLM. This approach is designed to describe, frame, and enhance the direction of virtual staining. Furthermore, we have developed a data augmentation method based on the constraints of the VLM. This method utilizes the VLM's powerful image interpretation capabilities to further integrate image style and structural information, proving beneficial in high-precision pathological diagnostics. Extensive evaluations on publicly available multi-domain unpaired staining datasets demonstrate that our method can generate highly realistic images and enhance the accuracy of downstream tasks, such as glomerular detection and segmentation. Our code is available at: https://github.com/CZZZZZZZZZZZZZZZZZ/VPGAN-HARBOR

Paper Structure

This paper contains 24 sections, 25 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Three auxiliary methods for virtual staining tasks proposed by us using the VLM: (a) Learnable contrastive prompts based on the classification task. (b) Concept anchors design based on LLM. (c) Visual calibration based on the VLM.
  • Figure 2: Overview of the proposed VPGAN and HARBOR. In the prompt generation phase, we employed prompt tuning method to generate contrastive prompts based on a binary classification task. Utilizing the DeepSeek-R1, we created constant concept anchors and independent concept anchors of different staining agents. During the training phase, we leveraged three types of prompts and a pathological VLM to achieve the description, framing, and reinforcement of the virtual staining direction, thereby optimizing the original virtual staining model. In the inference enhancement phase, we trained learnable denoising prompt blocks based on structural and stylistic constraints, further improving the performance of virtual staining.
  • Figure 3: We demonstrate a fine-grained verification process based on the VLM on the H&E2PASM task, enabling the progressive and successful verification of the staining domains.
  • Figure 4: The performance comparison of various existing methods and our proposed method for multiple stain transfer of the same H&E-stained image.
  • Figure 5: Overview of Downstream Task Datasets
  • ...and 1 more figures