Ensuring Ground Truth Accuracy in Healthcare with the EVINCE framework
Edward Y. Chang
TL;DR
The paper tackles the problem of misdiagnosis and the propagation of mislabeled ground-truth data into ML-driven clinical workflows. It introduces EVINCE, an entropy-based framework where multiple LLMs engage in structured, contentious debates, guided by information-duality principles to balance exploration and convergence. Core contributions include the IDEA theory for optimal LLM pairing (one high-entropy and one low-entropy with equal information quality) and Algorithmic Robust Aggregation (ARA) to minimize online regret and stabilize predictions. Empirical studies in Dengue vs. Chikungunya and ground-truth robustness/remediation demonstrate modest to notable gains in diagnostic accuracy and reveal how debate-driven uncertainty can surface ground-truth inconsistencies for remediation. Collectively, EVINCE offers a practical pathway to improve diagnostic precision and to audit and refine historical medical labels, with potential impact on patient safety and trust in AI-augmented healthcare.
Abstract
Misdiagnosis is a significant issue in healthcare, leading to harmful consequences for patients. The propagation of mislabeled data through machine learning models into clinical practice is unacceptable. This paper proposes EVINCE, a system designed to 1) improve diagnosis accuracy and 2) rectify misdiagnoses and minimize training data errors. EVINCE stands for Entropy Variation through Information Duality with Equal Competence, leveraging this novel theory to optimize the diagnostic process using multiple Large Language Models (LLMs) in a structured debate framework. Our empirical study verifies EVINCE to be effective in achieving its design goals.
