Table of Contents
Fetching ...

Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling

Aleksei Khalin, Ekaterina Zaychenkova, Aleksandr Yugay, Andrey Goncharov, Sergey Korchagin, Alexey Zaytsev, Egor Ershov

Abstract

Artificial intelligence (AI) systems accelerate medical workflows and improve diagnostic accuracy in healthcare, serving as second-opinion systems. However, the unpredictability of AI errors poses a significant challenge, particularly in healthcare contexts, where mistakes can have severe consequences. A widely adopted safeguard is to pair predictions with uncertainty estimation, enabling human experts to focus on high-risk cases while streamlining routine verification. Current uncertainty estimation methods, however, remain limited, particularly in quantifying aleatoric uncertainty, which arises from data ambiguity and noise. To address this, we propose a novel approach that leverages disagreement in expert responses to generate targets for training machine learning models. These targets are used in conjunction with standard data labels to estimate two components of uncertainty separately, as given by the law of total variance, via a two-ensemble approach, as well as its lightweight variant. We validate our method on binary image classification, binary and multi-class image segmentation, and multiple-choice question answering. Our experiments demonstrate that incorporating expert knowledge can enhance uncertainty estimation quality by $9\%$ to $50\%$ depending on the task, making this source of information invaluable for the construction of risk-aware AI systems in healthcare applications.

Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling

Abstract

Artificial intelligence (AI) systems accelerate medical workflows and improve diagnostic accuracy in healthcare, serving as second-opinion systems. However, the unpredictability of AI errors poses a significant challenge, particularly in healthcare contexts, where mistakes can have severe consequences. A widely adopted safeguard is to pair predictions with uncertainty estimation, enabling human experts to focus on high-risk cases while streamlining routine verification. Current uncertainty estimation methods, however, remain limited, particularly in quantifying aleatoric uncertainty, which arises from data ambiguity and noise. To address this, we propose a novel approach that leverages disagreement in expert responses to generate targets for training machine learning models. These targets are used in conjunction with standard data labels to estimate two components of uncertainty separately, as given by the law of total variance, via a two-ensemble approach, as well as its lightweight variant. We validate our method on binary image classification, binary and multi-class image segmentation, and multiple-choice question answering. Our experiments demonstrate that incorporating expert knowledge can enhance uncertainty estimation quality by to depending on the task, making this source of information invaluable for the construction of risk-aware AI systems in healthcare applications.

Paper Structure

This paper contains 1 section, 7 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Integration of expert knowledge into aleatoric uncertainty estimation provides total UE quality enhancement.
  • Figure 2: Overview of the proposed method. First, an ensemble of models is trained to predict ground truth labels from the dataset. Next, a different ensemble is created via fine-tuning on "soft" labels acquired from expert annotations. The two ensembles are then used to estimate epistemic and aleatoric uncertainty, respectively, given by equations \ref{['eq:epistemic']} and \ref{['eq:aleatoric']}. Finally, we evaluate the performance of algorithms with different uncertainty thresholds and construct rejection curves.
  • Figure 3: Rejection curves for proposed methods across four different machine learning tasks. Shaded areas represent the standard deviation between ensemble models. The higher the curve, the better.
  • Figure 4: (a) The pairwise consistency between experts estimated as the Cohen's kappa coefficient. Values on the diagonal are experts' accuracies. (b) Uncertainty estimation quality by the law of total variance where aleatoric uncertainty is estimated from different sources. Experts, best stands for using the best $k$ experts based on accuracy, and Experts, average is computed by averaging AAC for all possible sets of $k$ experts.
  • Figure 5: Relation of uncertainty estimation quality AAC on the multiple choice question answering task to the sizes of the base ensemble and the CAE. Lower values are better.
  • ...and 3 more figures