HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology
Sandeep Vissapragada, Vikrant Sahu, Gagan Raj Gupta, Vandita Singh
TL;DR
The paper addresses a trust and prompting gap in Vision-Language Models for histopathology, where opaque reasoning and brittle prompts impede clinical adoption. It presents HistoLens, a modular workflow integrating a Semantic Prompt Synthesizer, MedGemma-4B-IT, and a Multi-Modal XAI Engine to produce verifiable, tissue-focused analyses. A key innovation is ROI In-painting to mitigate shortcut learning, complemented by CAM-based heatmaps spanning from regional hotspots to pixel-level cues in a structured JSON report. On a 60-image dataset with expert validation, HistoLens achieved 86.7% agreement with senior pathologists and a 21% improvement in focus when ROI in-painting was enabled, supporting clinical readiness and outlining multi-institution validation as future work.
Abstract
For doctors to truly trust artificial intelligence, it can't be a black box. They need to understand its reasoning, almost as if they were consulting a colleague. We created HistoLens1 to be that transparent, collaborative partner. It allows a pathologist to simply ask a question in plain English about a tissue slide--just as they would ask a trainee. Our system intelligently translates this question into a precise query for its AI engine, which then provides a clear, structured report. But it doesn't stop there. If a doctor ever asks, "Why?", HistoLens can instantly provide a 'visual proof' for any finding--a heatmap that points to the exact cells and regions the AI used for its analysis. We've also ensured the AI focuses only on the patient's tissue, just like a trained pathologist would, by teaching it to ignore distracting background noise. The result is a workflow where the pathologist remains the expert in charge, using a trustworthy AI assistant to verify their insights and make faster, more confident diagnoses.
