Table of Contents
Fetching ...

Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology

Hongyan Gu, Chunxu Yang, Shino Magaki, Neda Zarrin-Khameh, Nelli S. Lakis, Inma Cobos, Negar Khanlou, Xinhai R. Zhang, Jasmeet Assi, Joshua T. Byers, Ameer Hamza, Karam Han, Anders Meyer, Hilda Mirbaha, Carrie A. Mohila, Todd M. Stevens, Sara L. Stone, Wenzhong Yan, Mohammad Haeri, Xiang 'Anthony' Chen

TL;DR

This study validates majority voting as a practical approach to foster appropriate AI reliance in pathology, specifically for mitosis detection. Across a multi-institutional cohort of 32 pathologists, groups as small as three AI-assisted clinicians produced higher Relative AI Reliance and Relative Self-Reliance than a single pathologist collaborating with AI, while also improving precision and maintaining competitive recall. The work details an AI-assisted, XAI-enabled interface, a robust offline majority-synthesis protocol (k = 3,5,7,...,27), and comprehensive statistical analyses demonstrating the benefits and costs of majority voting. The findings support adopting AI+k frameworks to balance efficiency and reliability in high-stakes medical decisions and offer insights transferable to other critical visual tasks requiring human–AI collaboration.

Abstract

As Artificial Intelligence (AI) making advancements in medical decision-making, there is a growing need to ensure doctors develop appropriate reliance on AI to avoid adverse outcomes. However, existing methods in enabling appropriate AI reliance might encounter challenges while being applied in the medical domain. With this regard, this work employs and provides the validation of an alternative approach -- majority voting -- to facilitate appropriate reliance on AI in medical decision-making. This is achieved by a multi-institutional user study involving 32 medical professionals with various backgrounds, focusing on the pathology task of visually detecting a pattern, mitoses, in tumor images. Here, the majority voting process was conducted by synthesizing decisions under AI assistance from a group of pathology doctors (pathologists). Two metrics were used to evaluate the appropriateness of AI reliance: Relative AI Reliance (RAIR) and Relative Self-Reliance (RSR). Results showed that even with groups of three pathologists, majority-voted decisions significantly increased both RAIR and RSR -- by approximately 9% and 31%, respectively -- compared to decisions made by one pathologist collaborating with AI. This increased appropriateness resulted in better precision and recall in the detection of mitoses. While our study is centered on pathology, we believe these insights can be extended to general high-stakes decision-making processes involving similar visual tasks.

Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology

TL;DR

This study validates majority voting as a practical approach to foster appropriate AI reliance in pathology, specifically for mitosis detection. Across a multi-institutional cohort of 32 pathologists, groups as small as three AI-assisted clinicians produced higher Relative AI Reliance and Relative Self-Reliance than a single pathologist collaborating with AI, while also improving precision and maintaining competitive recall. The work details an AI-assisted, XAI-enabled interface, a robust offline majority-synthesis protocol (k = 3,5,7,...,27), and comprehensive statistical analyses demonstrating the benefits and costs of majority voting. The findings support adopting AI+k frameworks to balance efficiency and reliability in high-stakes medical decisions and offer insights transferable to other critical visual tasks requiring human–AI collaboration.

Abstract

As Artificial Intelligence (AI) making advancements in medical decision-making, there is a growing need to ensure doctors develop appropriate reliance on AI to avoid adverse outcomes. However, existing methods in enabling appropriate AI reliance might encounter challenges while being applied in the medical domain. With this regard, this work employs and provides the validation of an alternative approach -- majority voting -- to facilitate appropriate reliance on AI in medical decision-making. This is achieved by a multi-institutional user study involving 32 medical professionals with various backgrounds, focusing on the pathology task of visually detecting a pattern, mitoses, in tumor images. Here, the majority voting process was conducted by synthesizing decisions under AI assistance from a group of pathology doctors (pathologists). Two metrics were used to evaluate the appropriateness of AI reliance: Relative AI Reliance (RAIR) and Relative Self-Reliance (RSR). Results showed that even with groups of three pathologists, majority-voted decisions significantly increased both RAIR and RSR -- by approximately 9% and 31%, respectively -- compared to decisions made by one pathologist collaborating with AI. This increased appropriateness resulted in better precision and recall in the detection of mitoses. While our study is centered on pathology, we believe these insights can be extended to general high-stakes decision-making processes involving similar visual tasks.
Paper Structure (37 sections, 9 equations, 11 figures, 2 tables)

This paper contains 37 sections, 9 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: (a) An example region-of-interest image used in the user study, with arrows pointing at the ground truth mitoses; (b) The anti-body test used by the three doctors to annotate the ground truth mitoses. Mitoses were shown in brown (as pointed by the arrows) in the anti-body test.
  • Figure 2: Organization of the user study.
  • Figure 3: Screenshots of the mitosis study websites: (a) The manual mitosis detection website in the stage 1 study. The user could left-click on the image to leave a mark for each mitosis detected (① -- ③). (b) The AI-assisted mitosis detection website in the stage 2 study. The interface added ① the AI recommendation box; ② "Show AI" switch, where the user could toggle on/off AI recommendations; ③ "AI Sensitivity" slider, where the user could adjust the sensitivity of AI based on their preference; ④ a warning message to remind users not relying on AI. (c) The website in stage 2 also provided an XAI evidence card for each AI recommendation. Each XAI evidence card included ① a saliency map; ② confidence level, including a probability score and a trust score; ③ a bar plot for subclass probability; and ④ similar examples. (d) After the user finishes examining all images, an evaluation page will inform the performance metrics to the participant.
  • Figure 4: Steps for synthesizing the majority voting decisions from $k$ AI-assisted pathologists: (a) random sampling: mitosis reportings from an odd number of $k$ randomly-sampled, AI-assisted pathologists were collected, (b) majority voting: mitoses candidates reported by $>k/2$ pathologists remained as the final decision.
  • Figure 5: Combinatorics for reliance incidents in the condition of one pathologist collaborating with AI (i.e., one-human-AI) for the mitosis detection task. This chart is adopted from the framework described in 10.1145/3581641.3584066.
  • ...and 6 more figures