Table of Contents
Fetching ...

Human-Centered AI in Multidisciplinary Medical Discussions: Evaluating the Feasibility of a Chat-Based Approach to Case Assessment

Shinnosuke Sawano, Satoshi Kodera

TL;DR

This study investigates the feasibility of a human-centered AI chat platform for collaborative, multidisciplinary cardiovascular case assessment in multimorbidity. It uses five simulated cases and a ChatGPT-4o workflow to generate AI-assisted summaries, quantify hallucinations, and compare knowledge-graph structures between multidisciplinary teams and single physicians. The findings show an approximate $79.98\%$ reduction in discussion time with AI assistance, while maintaining structured knowledge representation; average overall hallucinations are $3.62\%$ (harmful $0.49\%$). Multidisciplinary assessments produced deeper, more branched knowledge graphs with distinct centrality patterns, underscoring the potential and safety considerations of AI-assisted, human-centered medical decision-making in real-world workflows.

Abstract

In this study, we investigate the feasibility of using a human-centered artificial intelligence (AI) chat platform where medical specialists collaboratively assess complex cases. As the target population for this platform, we focus on patients with cardiovascular diseases who are in a state of multimorbidity, that is, suffering from multiple chronic conditions. We evaluate simulated cases with multiple diseases using a chat application by collaborating with physicians to assess feasibility, efficiency gains through AI utilization, and the quantification of discussion content. We constructed simulated cases based on past case reports, medical errors reports and complex cases of cardiovascular diseases experienced by the physicians. The analysis of discussions across five simulated cases demonstrated a significant reduction in the time required for summarization using AI, with an average reduction of 79.98\%. Additionally, we examined hallucination rates in AI-generated summaries used in multidisciplinary medical discussions. The overall hallucination rate ranged from 1.01\% to 5.73\%, with an average of 3.62\%, whereas the harmful hallucination rate varied from 0.00\% to 2.09\%, with an average of 0.49\%. Furthermore, morphological analysis demonstrated that multidisciplinary assessments enabled a more complex and detailed representation of medical knowledge compared with single physician assessments. We examined structural differences between multidisciplinary and single physician assessments using centrality metrics derived from the knowledge graph. In this study, we demonstrated that AI-assisted summarization significantly reduced the time required for medical discussions while maintaining structured knowledge representation. These findings can support the feasibility of AI-assisted chat-based discussions as a human-centered approach to multidisciplinary medical decision-making.

Human-Centered AI in Multidisciplinary Medical Discussions: Evaluating the Feasibility of a Chat-Based Approach to Case Assessment

TL;DR

This study investigates the feasibility of a human-centered AI chat platform for collaborative, multidisciplinary cardiovascular case assessment in multimorbidity. It uses five simulated cases and a ChatGPT-4o workflow to generate AI-assisted summaries, quantify hallucinations, and compare knowledge-graph structures between multidisciplinary teams and single physicians. The findings show an approximate reduction in discussion time with AI assistance, while maintaining structured knowledge representation; average overall hallucinations are (harmful ). Multidisciplinary assessments produced deeper, more branched knowledge graphs with distinct centrality patterns, underscoring the potential and safety considerations of AI-assisted, human-centered medical decision-making in real-world workflows.

Abstract

In this study, we investigate the feasibility of using a human-centered artificial intelligence (AI) chat platform where medical specialists collaboratively assess complex cases. As the target population for this platform, we focus on patients with cardiovascular diseases who are in a state of multimorbidity, that is, suffering from multiple chronic conditions. We evaluate simulated cases with multiple diseases using a chat application by collaborating with physicians to assess feasibility, efficiency gains through AI utilization, and the quantification of discussion content. We constructed simulated cases based on past case reports, medical errors reports and complex cases of cardiovascular diseases experienced by the physicians. The analysis of discussions across five simulated cases demonstrated a significant reduction in the time required for summarization using AI, with an average reduction of 79.98\%. Additionally, we examined hallucination rates in AI-generated summaries used in multidisciplinary medical discussions. The overall hallucination rate ranged from 1.01\% to 5.73\%, with an average of 3.62\%, whereas the harmful hallucination rate varied from 0.00\% to 2.09\%, with an average of 0.49\%. Furthermore, morphological analysis demonstrated that multidisciplinary assessments enabled a more complex and detailed representation of medical knowledge compared with single physician assessments. We examined structural differences between multidisciplinary and single physician assessments using centrality metrics derived from the knowledge graph. In this study, we demonstrated that AI-assisted summarization significantly reduced the time required for medical discussions while maintaining structured knowledge representation. These findings can support the feasibility of AI-assisted chat-based discussions as a human-centered approach to multidisciplinary medical decision-making.

Paper Structure

This paper contains 15 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Time-Saving Impact of Generative AI. Illustration of the time-saving impact of generative AI in case discussions using a comparison of the time required for case assessments with and without AI support. The results demonstrated that AI-assisted discussions led to a 79.98% reduction in the time required for case assessment.
  • Figure 1: Visualization of Degree Centrality for Comparative Analysis. Degree centrality visualization highlights nodes with a high number of direct connections, thereby representing the prominence of specific medical concepts within discussions. Degree centrality was significantly higher in single physician assessments (0.084 ± 0.062) compared with multidisciplinary discussions (0.058 ± 0.043), with a p-value of < 0.001. To facilitate visual comparison, the degree centrality values for both groups were scaled by the same fixed factor before node visualization, ensuring that differences between the two groups are more clearly distinguishable.
  • Figure 2: Comparison of Overall and Harmful Hallucination Rates. Illustration of the hallucination rates observed in AI-generated summaries used in multidisciplinary medical discussions. The overall hallucination rate ranged from 1.01% to 5.73%, with an average of 3.62%. The harmful hallucination varied from 0.00% to 2.09%, with an average of 0.49%.
  • Figure 2: Visualization of Betweenness Centrality for Comparative Analysis. Betweenness centrality visualization identifies key intermediary nodes that act as bridges in the knowledge network, thereby reflecting the flow of clinical reasoning and decision-making pathways. Betweenness centrality was significantly higher in multidisciplinary teams (0.012 ± 0.030) compared with single physician assessments (0.002 ± 0.002), with a p-value of < 0.001. To facilitate visual comparison, the betweenness centrality values for both groups were scaled by the same fixed factor before node visualization, ensuring that differences between the two groups are more clearly distinguishable.