Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis
Ferhat Ozgur Catak, Murat Kuzlu, Taylor Patrick
TL;DR
This work tackles the crucial issue of uncertainty in vision-language models (VLMs) applied to medical Visual Question Answering by proposing a convex hull–based uncertainty metric. The authors deploy LLM-CXR to generate radiology reports from chest X-ray images across five temperature settings, producing $n=30$ responses per image, which are embedded with $E(r_i)\in\mathbb{R}^d$, projected to 2D via PCA, and clustered with DBSCAN to compute convex hull areas. The total uncertainty is defined as $A(p,t)=\sum_{c\in L,c\neq -1}\text{Area}(\text{ConvexHull}(c))$, and mathematical justification shows that larger response diversity yields larger hull areas, while temperature modulates this spread with $\frac{\partial A(p,t)}{\partial t} > 0$. Experimental results across temperatures reveal clear trends: near-zero uncertainty at very low $t$, increasing and sometimes bimodal uncertainty at higher $t$, with notable implications for data quality and the need for explainable AI integration to enhance trust in clinical deployment. Overall, the paper provides a concrete, geometry‑based uncertainty metric that can guide the development and evaluation of safe, reliable medical VLMs in radiology reporting.
Abstract
In recent years, vision-language models (VLMs) have been applied to various fields, including healthcare, education, finance, and manufacturing, with remarkable performance. However, concerns remain regarding VLMs' consistency and uncertainty, particularly in critical applications such as healthcare, which demand a high level of trust and reliability. This paper proposes a novel approach to evaluate uncertainty in VLMs' responses using a convex hull approach on a healthcare application for Visual Question Answering (VQA). LLM-CXR model is selected as the medical VLM utilized to generate responses for a given prompt at different temperature settings, i.e., 0.001, 0.25, 0.50, 0.75, and 1.00. According to the results, the LLM-CXR VLM shows a high uncertainty at higher temperature settings. Experimental outcomes emphasize the importance of uncertainty in VLMs' responses, especially in healthcare applications.
