Table of Contents
Fetching ...

Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis

Ferhat Ozgur Catak, Murat Kuzlu, Taylor Patrick

TL;DR

This work tackles the crucial issue of uncertainty in vision-language models (VLMs) applied to medical Visual Question Answering by proposing a convex hull–based uncertainty metric. The authors deploy LLM-CXR to generate radiology reports from chest X-ray images across five temperature settings, producing $n=30$ responses per image, which are embedded with $E(r_i)\in\mathbb{R}^d$, projected to 2D via PCA, and clustered with DBSCAN to compute convex hull areas. The total uncertainty is defined as $A(p,t)=\sum_{c\in L,c\neq -1}\text{Area}(\text{ConvexHull}(c))$, and mathematical justification shows that larger response diversity yields larger hull areas, while temperature modulates this spread with $\frac{\partial A(p,t)}{\partial t} > 0$. Experimental results across temperatures reveal clear trends: near-zero uncertainty at very low $t$, increasing and sometimes bimodal uncertainty at higher $t$, with notable implications for data quality and the need for explainable AI integration to enhance trust in clinical deployment. Overall, the paper provides a concrete, geometry‑based uncertainty metric that can guide the development and evaluation of safe, reliable medical VLMs in radiology reporting.

Abstract

In recent years, vision-language models (VLMs) have been applied to various fields, including healthcare, education, finance, and manufacturing, with remarkable performance. However, concerns remain regarding VLMs' consistency and uncertainty, particularly in critical applications such as healthcare, which demand a high level of trust and reliability. This paper proposes a novel approach to evaluate uncertainty in VLMs' responses using a convex hull approach on a healthcare application for Visual Question Answering (VQA). LLM-CXR model is selected as the medical VLM utilized to generate responses for a given prompt at different temperature settings, i.e., 0.001, 0.25, 0.50, 0.75, and 1.00. According to the results, the LLM-CXR VLM shows a high uncertainty at higher temperature settings. Experimental outcomes emphasize the importance of uncertainty in VLMs' responses, especially in healthcare applications.

Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis

TL;DR

This work tackles the crucial issue of uncertainty in vision-language models (VLMs) applied to medical Visual Question Answering by proposing a convex hull–based uncertainty metric. The authors deploy LLM-CXR to generate radiology reports from chest X-ray images across five temperature settings, producing responses per image, which are embedded with , projected to 2D via PCA, and clustered with DBSCAN to compute convex hull areas. The total uncertainty is defined as , and mathematical justification shows that larger response diversity yields larger hull areas, while temperature modulates this spread with . Experimental results across temperatures reveal clear trends: near-zero uncertainty at very low , increasing and sometimes bimodal uncertainty at higher , with notable implications for data quality and the need for explainable AI integration to enhance trust in clinical deployment. Overall, the paper provides a concrete, geometry‑based uncertainty metric that can guide the development and evaluation of safe, reliable medical VLMs in radiology reporting.

Abstract

In recent years, vision-language models (VLMs) have been applied to various fields, including healthcare, education, finance, and manufacturing, with remarkable performance. However, concerns remain regarding VLMs' consistency and uncertainty, particularly in critical applications such as healthcare, which demand a high level of trust and reliability. This paper proposes a novel approach to evaluate uncertainty in VLMs' responses using a convex hull approach on a healthcare application for Visual Question Answering (VQA). LLM-CXR model is selected as the medical VLM utilized to generate responses for a given prompt at different temperature settings, i.e., 0.001, 0.25, 0.50, 0.75, and 1.00. According to the results, the LLM-CXR VLM shows a high uncertainty at higher temperature settings. Experimental outcomes emphasize the importance of uncertainty in VLMs' responses, especially in healthcare applications.

Paper Structure

This paper contains 32 sections, 9 equations, 31 figures, 1 table.

Figures (31)

  • Figure 1: The overall experimental setup for calculating uncertainty in VLM responses.
  • Figure 2: Uncertainty distribution at the temperature setting=0.001
  • Figure 3: Most uncertain instances (a-b) at the temperature setting of 0.001
  • Figure 4: Uncertainty distribution at the temperature setting=0.25
  • Figure 5: Most uncertain instances (a-b) at the temperature setting of 0.25
  • ...and 26 more figures