Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology
Hongyan Gu, Chunxu Yang, Shino Magaki, Neda Zarrin-Khameh, Nelli S. Lakis, Inma Cobos, Negar Khanlou, Xinhai R. Zhang, Jasmeet Assi, Joshua T. Byers, Ameer Hamza, Karam Han, Anders Meyer, Hilda Mirbaha, Carrie A. Mohila, Todd M. Stevens, Sara L. Stone, Wenzhong Yan, Mohammad Haeri, Xiang 'Anthony' Chen
TL;DR
This study validates majority voting as a practical approach to foster appropriate AI reliance in pathology, specifically for mitosis detection. Across a multi-institutional cohort of 32 pathologists, groups as small as three AI-assisted clinicians produced higher Relative AI Reliance and Relative Self-Reliance than a single pathologist collaborating with AI, while also improving precision and maintaining competitive recall. The work details an AI-assisted, XAI-enabled interface, a robust offline majority-synthesis protocol (k = 3,5,7,...,27), and comprehensive statistical analyses demonstrating the benefits and costs of majority voting. The findings support adopting AI+k frameworks to balance efficiency and reliability in high-stakes medical decisions and offer insights transferable to other critical visual tasks requiring human–AI collaboration.
Abstract
As Artificial Intelligence (AI) making advancements in medical decision-making, there is a growing need to ensure doctors develop appropriate reliance on AI to avoid adverse outcomes. However, existing methods in enabling appropriate AI reliance might encounter challenges while being applied in the medical domain. With this regard, this work employs and provides the validation of an alternative approach -- majority voting -- to facilitate appropriate reliance on AI in medical decision-making. This is achieved by a multi-institutional user study involving 32 medical professionals with various backgrounds, focusing on the pathology task of visually detecting a pattern, mitoses, in tumor images. Here, the majority voting process was conducted by synthesizing decisions under AI assistance from a group of pathology doctors (pathologists). Two metrics were used to evaluate the appropriateness of AI reliance: Relative AI Reliance (RAIR) and Relative Self-Reliance (RSR). Results showed that even with groups of three pathologists, majority-voted decisions significantly increased both RAIR and RSR -- by approximately 9% and 31%, respectively -- compared to decisions made by one pathologist collaborating with AI. This increased appropriateness resulted in better precision and recall in the detection of mitoses. While our study is centered on pathology, we believe these insights can be extended to general high-stakes decision-making processes involving similar visual tasks.
