Table of Contents
Fetching ...

Detecting Hallucinations in Virtual Histology with Neural Precursors

Ji-Hun Oh, Kianoush Falahkheirkhah, Rohit Bhargava

TL;DR

This work introduces a scalable, post-hoc hallucination detection method that identifies a Neural Hallucination Precursor (NHP) from VS model embeddings for test-time detection and reports extensive validation across diverse and challenging VS settings to demonstrate NHP's effectiveness and robustness.

Abstract

Significant biomedical research and clinical care rely on the histopathologic examination of tissue structure using microscopy of stained tissue. Virtual staining (VS) offers a promising alternative with the potential to reduce cost and eliminate the use of toxic reagents. However, the critical challenge of hallucinations limits confidence in its use, necessitating a VS co-pilot to detect these hallucinations. Here, we first formally establish the problem of hallucination detection in VS. Next, we introduce a scalable, post-hoc hallucination detection method that identifies a Neural Hallucination Precursor (NHP) from VS model embeddings for test-time detection. We report extensive validation across diverse and challenging VS settings to demonstrate NHP's effectiveness and robustness. Furthermore, we show that VS models with fewer hallucinations do not necessarily disclose them better, risking a false sense of security when reporting just the former metric. This highlights the need for a reassessment of current VS evaluation practices.

Detecting Hallucinations in Virtual Histology with Neural Precursors

TL;DR

This work introduces a scalable, post-hoc hallucination detection method that identifies a Neural Hallucination Precursor (NHP) from VS model embeddings for test-time detection and reports extensive validation across diverse and challenging VS settings to demonstrate NHP's effectiveness and robustness.

Abstract

Significant biomedical research and clinical care rely on the histopathologic examination of tissue structure using microscopy of stained tissue. Virtual staining (VS) offers a promising alternative with the potential to reduce cost and eliminate the use of toxic reagents. However, the critical challenge of hallucinations limits confidence in its use, necessitating a VS co-pilot to detect these hallucinations. Here, we first formally establish the problem of hallucination detection in VS. Next, we introduce a scalable, post-hoc hallucination detection method that identifies a Neural Hallucination Precursor (NHP) from VS model embeddings for test-time detection. We report extensive validation across diverse and challenging VS settings to demonstrate NHP's effectiveness and robustness. Furthermore, we show that VS models with fewer hallucinations do not necessarily disclose them better, risking a false sense of security when reporting just the former metric. This highlights the need for a reassessment of current VS evaluation practices.

Paper Structure

This paper contains 20 sections, 9 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Hallucination examples. Symptoms can be transparent (left) or realistic (right). The latter is particularly challenging to identify as the hallucination is within the target data manifold.
  • Figure 2: Hallucination causes. We confirm each factor by ablating specific components (bottom row) in the VS pipeline and observing the resultant drop in MS-SSIM. The adopted experiment is from §\ref{['sec:5']}: VS of HE (target) from 4 SRS bands (source), trained by Pix2PixHD wang2018high and evaluated over an ID test set. OOD and adversarial example details in §\ref{['sec:5.3']}. Similar results were seen with PSNR and LPIPS.
  • Figure 3: The data dependency of hallucinogenic factors. We plot the MS-SSIM per sample in Fig. \ref{['fig2']} before (x-axis) vs. after (y-axis) select ablations: (a) reducing SRS source content from 4 to 2 bands, (b) underspecification by switching Pix2PixHD with CycleGAN zhu2017unpaired, (c) shifting the ID test data to become OOD.
  • Figure 4: UMAP of ID embeddings. We visualize the Uniform Manifold Approximation and Projection (UMAP) mcinnes2018umap of the VS model embeddings for ID. In addition to the SRS-to-HE experiment in Fig. \ref{['fig2']} (left), we also show a HO342-to-CD3 case from §\ref{['sec:5']} (right). Color code reflects low/high MS-SSIM, with top hallucinatory instances (lowest MS-SSIM) marked by $\star$. Note how outliers are not necessarily hallucinations and vice versa.
  • Figure 5: Schematic pipeline of NHP. The hallucination-free feature bank and NHP parameters are determined beforehand using a calibration dataset, which may be sampled from the training set. During VS inference, we extract the feature, compute FN and KNN, and balance them per Eq. \ref{['eq8']} to estimate the VS prediction confidence.
  • ...and 4 more figures