Expert-aware uncertainty estimation for quality control of neural-based blood typing
Ekaterina Zaychenkova, Dmitrii Iarchuk, Sergey Korchagin, Alexey Zaitsev, Egor Ershov
TL;DR
This work tackles the challenge of uncertainty estimation in neural networks for medical second opinions by introducing expert-aware uncertainty quantification (EAUQ) that fuses ground-truth labels with expert assessments of case complexity. It formalizes uncertainty as a decomposition $UQ(x) = UQ_{\mathfrak{a}}(x) + UQ_{\mathfrak{e}}(x)$ and leverages ensemble standard deviation $\mathrm{STD}(x)$ alongside an expert-derived metric $MP(x) = 1 - \max(\overline{e}(x), 1 - \overline{e}(x))$ to capture aleatoric and epistemic components, respectively. The authors implement a dual-path pipeline: a 20‑net ensemble (CE) augmented by $MP$, and a deterministic Expert-Aware Network (EAN) trained to emulate average expert responses, extended to an Expert-Aware Ensemble (EAE). A new BloodyWell dataset of 3139 serology images with six expert assessments enables evaluation of uncertainty estimation in blood typing (ABO, RH, KELL) and demonstrates a 2.5× improvement in uncertainty calibration with expert labels and a 35% gain when using neural-based expert consensus, highlighting the practical impact of combining expert insights with ensemble approaches for safe, reliable medical AI.
Abstract
In medical diagnostics, accurate uncertainty estimation for neural-based models is essential for complementing second-opinion systems. Despite neural network ensembles' proficiency in this problem, a gap persists between actual uncertainties and predicted estimates. A major difficulty here is the lack of labels on the hardness of examples: a typical dataset includes only ground truth target labels, making the uncertainty estimation problem almost unsupervised. Our novel approach narrows this gap by integrating expert assessments of case complexity into the neural network's learning process, utilizing both definitive target labels and supplementary complexity ratings. We validate our methodology for blood typing, leveraging a new dataset "BloodyWell" unique in augmenting labeled reaction images with complexity scores from six medical specialists. Experiments demonstrate enhancement of our approach in uncertainty prediction, achieving a 2.5-fold improvement with expert labels and a 35% increase in performance with estimates of neural-based expert consensus.
