Table of Contents
Fetching ...

HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology

Sandeep Vissapragada, Vikrant Sahu, Gagan Raj Gupta, Vandita Singh

TL;DR

The paper addresses a trust and prompting gap in Vision-Language Models for histopathology, where opaque reasoning and brittle prompts impede clinical adoption. It presents HistoLens, a modular workflow integrating a Semantic Prompt Synthesizer, MedGemma-4B-IT, and a Multi-Modal XAI Engine to produce verifiable, tissue-focused analyses. A key innovation is ROI In-painting to mitigate shortcut learning, complemented by CAM-based heatmaps spanning from regional hotspots to pixel-level cues in a structured JSON report. On a 60-image dataset with expert validation, HistoLens achieved 86.7% agreement with senior pathologists and a 21% improvement in focus when ROI in-painting was enabled, supporting clinical readiness and outlining multi-institution validation as future work.

Abstract

For doctors to truly trust artificial intelligence, it can't be a black box. They need to understand its reasoning, almost as if they were consulting a colleague. We created HistoLens1 to be that transparent, collaborative partner. It allows a pathologist to simply ask a question in plain English about a tissue slide--just as they would ask a trainee. Our system intelligently translates this question into a precise query for its AI engine, which then provides a clear, structured report. But it doesn't stop there. If a doctor ever asks, "Why?", HistoLens can instantly provide a 'visual proof' for any finding--a heatmap that points to the exact cells and regions the AI used for its analysis. We've also ensured the AI focuses only on the patient's tissue, just like a trained pathologist would, by teaching it to ignore distracting background noise. The result is a workflow where the pathologist remains the expert in charge, using a trustworthy AI assistant to verify their insights and make faster, more confident diagnoses.

HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology

TL;DR

The paper addresses a trust and prompting gap in Vision-Language Models for histopathology, where opaque reasoning and brittle prompts impede clinical adoption. It presents HistoLens, a modular workflow integrating a Semantic Prompt Synthesizer, MedGemma-4B-IT, and a Multi-Modal XAI Engine to produce verifiable, tissue-focused analyses. A key innovation is ROI In-painting to mitigate shortcut learning, complemented by CAM-based heatmaps spanning from regional hotspots to pixel-level cues in a structured JSON report. On a 60-image dataset with expert validation, HistoLens achieved 86.7% agreement with senior pathologists and a 21% improvement in focus when ROI in-painting was enabled, supporting clinical readiness and outlining multi-institution validation as future work.

Abstract

For doctors to truly trust artificial intelligence, it can't be a black box. They need to understand its reasoning, almost as if they were consulting a colleague. We created HistoLens1 to be that transparent, collaborative partner. It allows a pathologist to simply ask a question in plain English about a tissue slide--just as they would ask a trainee. Our system intelligently translates this question into a precise query for its AI engine, which then provides a clear, structured report. But it doesn't stop there. If a doctor ever asks, "Why?", HistoLens can instantly provide a 'visual proof' for any finding--a heatmap that points to the exact cells and regions the AI used for its analysis. We've also ensured the AI focuses only on the patient's tissue, just like a trained pathologist would, by teaching it to ignore distracting background noise. The result is a workflow where the pathologist remains the expert in charge, using a trustworthy AI assistant to verify their insights and make faster, more confident diagnoses.

Paper Structure

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: "The HistoLens workflow. A pathologist's natural language query about a Ki-67 stained slide is converted by the Semantic Prompt Synthesizer, analyzed by the VLM Core, and the result is visualized with the XAI Engine, allowing for full transparency and verification."
  • Figure 2: Visual verification using the HistoLens XAI toolkit. The left panel shows the original PD-L1 stained input image. The VLM identified the "staining_location_per_cell" as cytoplasmic. The right panel shows the corresponding Grad-CAM heatmap, which confirms the model correctly focused on the cytoplasm of the tumor cells (highlighted in red/yellow), increasing the pathologist's trust in the output.