Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Betul Yurdem; Ferhat Ozgur Catak; Murat Kuzlu; Mehmet Kemal Gullu

Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Betul Yurdem, Ferhat Ozgur Catak, Murat Kuzlu, Mehmet Kemal Gullu

TL;DR

The proposed logit-level uncertainty quantification framework for histopathology image analysis using VLMs demonstrates a critical separation in uncertainty behavior and emphasizes the importance of logit-level uncertainty quantification to evaluate trustworthiness in histopathology applications utilizing VLMs.

Abstract

Vision-Language Models (VLMs) with their multimodal capabilities have demonstrated remarkable success in almost all domains, including education, transportation, healthcare, energy, finance, law, and retail. Nevertheless, the utilization of VLMs in healthcare applications raises crucial concerns due to the sensitivity of large-scale medical data and the trustworthiness of these models (reliability, transparency, and security). This study proposes a logit-level uncertainty quantification (UQ) framework for histopathology image analysis using VLMs to deal with these concerns. UQ is evaluated for three VLMs using metrics derived from temperature-controlled output logits. The proposed framework demonstrates a critical separation in uncertainty behavior. While VLMs show high stochastic sensitivity (cosine similarity (CS) $<0.71$ and $<0.84$, Jensen-Shannon divergence (JS) $<0.57$ and $<0.38$, and Kullback-Leibler divergence (KL) $<0.55$ and $<0.35$, respectively for mean values of VILA-M3-8B and LLaVA-Med v1.5), near-maximal temperature impacts ($Δ_T \approx 1.00$), and displaying abrupt uncertainty transitions, particularly for complex diagnostic prompts. In contrast, the pathology-specific PRISM model maintains near-deterministic behavior (mean CS $>0.90$, JS $<0.10$, KL $<0.09$) and significantly minimal temperature effects across all prompt complexities. These findings emphasize the importance of logit-level uncertainty quantification to evaluate trustworthiness in histopathology applications utilizing VLMs.

Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

TL;DR

Abstract

and

, Jensen-Shannon divergence (JS)

and

, and Kullback-Leibler divergence (KL)

and

, respectively for mean values of VILA-M3-8B and LLaVA-Med v1.5), near-maximal temperature impacts (

), and displaying abrupt uncertainty transitions, particularly for complex diagnostic prompts. In contrast, the pathology-specific PRISM model maintains near-deterministic behavior (mean CS

, JS

, KL

) and significantly minimal temperature effects across all prompt complexities. These findings emphasize the importance of logit-level uncertainty quantification to evaluate trustworthiness in histopathology applications utilizing VLMs.

Paper Structure (33 sections, 13 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 13 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
System Overview
High-Level Description
Embedding-Space Characterization
Image Embedding Extraction
Temperature-Dependent Autoregressive Generation
Logit Tensor Normalization and Pairwise Comparisons
Cosine Similarity (CS)
Kullback–Leibler (KL) Divergence
Jensen-Shannon (JS) Divergence
Mean Absolute Error (MAE)
Algorithmic Pipeline
Experimental Design
Dataset Configuration
Computational Efficiency
...and 18 more sections

Figures (6)

Figure 1: Diagram with key highlights from the proposed logit-level uncertainty quantification framework.
Figure 2: Embedding spaces of the evaluated VLMs and positions of the used histopathological patches.
Figure 3: Normalized CS versus temperature for three VLMs across three question complexity levels is shown. Higher values indicate greater consistency between repeated iterations.
Figure 4: Normalized JS divergence versus temperature. Lower values indicate minimal uncertainty between repeated iterations.
Figure 5: Normalized KL divergence versus temperature. Lower values indicate exceptional stability and highly reproducible probability distributions.
...and 1 more figures

Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

TL;DR

Abstract

Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)