Between Knowledge and Care: Evaluating Generative AI-Based IUI in Type 2 Diabetes Management Through Patient and Physician Perspectives
Yibo Meng, Ruiqi Chen, Bingyi Liu, Yan Guan, Xiaolan Ding
TL;DR
This study addresses how generative AI-based IUI supports Type 2 diabetes management by integrating patient experiences and physician evaluations in China. It develops a real-world benchmark of 66 patient questions across seven domains and an accompanying five-dimensional rubric (Accuracy, Safety, Clarity, Integrity, Action Orientation) for expert assessment across four AI models. Quantitative results reveal a clear model hierarchy (ChatGPT strongest; others lag with variability) and domain-specific gaps, especially in medication guidance, interpretation, and emotional support. Qualitative insights underscore the need for trust calibration, risk-aware fallbacks, and human–AI collaboration, ultimately arguing for task-aware orchestration and emotionally attuned interfaces to safely integrate AI into chronic-care workflows.
Abstract
Generative AI systems are increasingly adopted by patients seeking everyday health guidance, yet their reliability and clinical appropriateness remain uncertain. Taking Type 2 Diabetes Mellitus (T2DM) as a representative chronic condition, this paper presents a two-part mixed-methods study that examines how patients and physicians in China evaluate the quality and usability of AI-generated health information. Study~1 analyzes 784 authentic patient questions to identify seven core categories of informational needs and five evaluation dimensions -- \textit{Accuracy, Safety, Clarity, Integrity}, and \textit{Action Orientation}. Study~2 involves seven endocrinologists who assess responses from four mainstream AI models across these dimensions. Quantitative and qualitative findings reveal consistent strengths in factual and lifestyle guidance but significant weaknesses in medication interpretation, contextual reasoning, and empathy. Patients view AI as an accessible ``pre-visit educator,'' whereas clinicians highlight its lack of clinical safety and personalization. Together, the findings inform design implications for interactive health systems, advocating for multi-model orchestration, risk-aware fallback mechanisms, and emotionally attuned communication to ensure trustworthy AI assistance in chronic disease care.
