Table of Contents
Fetching ...

People over trust AI-generated medical responses and view them to be as valid as doctors, despite low accuracy

Shruthi Shekar, Pat Pataranutaporn, Chethan Sarabu, Guillermo A. Cecchi, Pattie Maes

TL;DR

Both experts and non-experts exhibited bias, finding AI-generated responses to be more thorough and accurate than Doctors' responses but still valuing the involvement of a Doctor in the delivery of their medical advice.

Abstract

This paper presents a comprehensive analysis of how AI-generated medical responses are perceived and evaluated by non-experts. A total of 300 participants gave evaluations for medical responses that were either written by a medical doctor on an online healthcare platform, or generated by a large language model and labeled by physicians as having high or low accuracy. Results showed that participants could not effectively distinguish between AI-generated and Doctors' responses and demonstrated a preference for AI-generated responses, rating High Accuracy AI-generated responses as significantly more valid, trustworthy, and complete/satisfactory. Low Accuracy AI-generated responses on average performed very similar to Doctors' responses, if not more. Participants not only found these low-accuracy AI-generated responses to be valid, trustworthy, and complete/satisfactory but also indicated a high tendency to follow the potentially harmful medical advice and incorrectly seek unnecessary medical attention as a result of the response provided. This problematic reaction was comparable if not more to the reaction they displayed towards doctors' responses. This increased trust placed on inaccurate or inappropriate AI-generated medical advice can lead to misdiagnosis and harmful consequences for individuals seeking help. Further, participants were more trusting of High Accuracy AI-generated responses when told they were given by a doctor and experts rated AI-generated responses significantly higher when the source of the response was unknown. Both experts and non-experts exhibited bias, finding AI-generated responses to be more thorough and accurate than Doctors' responses but still valuing the involvement of a Doctor in the delivery of their medical advice. Ensuring AI systems are implemented with medical professionals should be the future of using AI for the delivery of medical advice.

People over trust AI-generated medical responses and view them to be as valid as doctors, despite low accuracy

TL;DR

Both experts and non-experts exhibited bias, finding AI-generated responses to be more thorough and accurate than Doctors' responses but still valuing the involvement of a Doctor in the delivery of their medical advice.

Abstract

This paper presents a comprehensive analysis of how AI-generated medical responses are perceived and evaluated by non-experts. A total of 300 participants gave evaluations for medical responses that were either written by a medical doctor on an online healthcare platform, or generated by a large language model and labeled by physicians as having high or low accuracy. Results showed that participants could not effectively distinguish between AI-generated and Doctors' responses and demonstrated a preference for AI-generated responses, rating High Accuracy AI-generated responses as significantly more valid, trustworthy, and complete/satisfactory. Low Accuracy AI-generated responses on average performed very similar to Doctors' responses, if not more. Participants not only found these low-accuracy AI-generated responses to be valid, trustworthy, and complete/satisfactory but also indicated a high tendency to follow the potentially harmful medical advice and incorrectly seek unnecessary medical attention as a result of the response provided. This problematic reaction was comparable if not more to the reaction they displayed towards doctors' responses. This increased trust placed on inaccurate or inappropriate AI-generated medical advice can lead to misdiagnosis and harmful consequences for individuals seeking help. Further, participants were more trusting of High Accuracy AI-generated responses when told they were given by a doctor and experts rated AI-generated responses significantly higher when the source of the response was unknown. Both experts and non-experts exhibited bias, finding AI-generated responses to be more thorough and accurate than Doctors' responses but still valuing the involvement of a Doctor in the delivery of their medical advice. Ensuring AI systems are implemented with medical professionals should be the future of using AI for the delivery of medical advice.
Paper Structure (47 sections, 15 figures)

This paper contains 47 sections, 15 figures.

Figures (15)

  • Figure 1: Visual summary of the dataset construction and pipeline of experiments discussed in this paper.
  • Figure 1: Demographics of participants in Experiment 1, 2, and 3. Values represent the number of participants in each category. A total of 98 participant evaluations were considered for Experiment 1, 96 for Experiment 2, and 100 for Experiment 3.
  • Figure 2: Example Medical Questions by Category: Comparing Doctors' and AI-Generated Responses
  • Figure 2: The expert evaluation's questionnaires and instruction.
  • Figure 3: Expert Evaluation of AI-generated medical response accuracy. (A) The table represents the compilation of the four Physician Accuracy Evaluation scores with the values for each evaluation as follows: Yes = 3, Maybe = 2, No = 1. Using the following numerical values for each expert evaluation, a compiled score was formed. Any score equal or above 10 (with two or fewer “Maybe” evaluations) was considered High Accuracy. Any score equal or below 9 (majority of evaluations are “Maybe” or worse) was considered Low Accuracy. (B) In a dataset of 150 AI-generated medical responses, 56.0% were of High Accuracy and 44.0% were of Low Accuracy. C) Breakdown of High and Low Accuracy AI-generated responses across the six different medical domains.
  • ...and 10 more figures