Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management

Lai Wei; Zhen Ying; Muyang He; Yutong Chen; Qian Yang; Yanzhe Hong; Jiaping Lu; Kaipeng Zheng; Shaoting Zhang; Xiaoying Li; Weiran Huang; Ying Chen

Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management

Lai Wei, Zhen Ying, Muyang He, Yutong Chen, Qian Yang, Yanzhe Hong, Jiaping Lu, Kaipeng Zheng, Shaoting Zhang, Xiaoying Li, Weiran Huang, Ying Chen

TL;DR

Diabetes care faces global resource and knowledge gaps that limit scalable, personalized management. The authors present Diabetica, a diabetes-focused language framework trained via a self-distillation pipeline on a carefully curated Diabetes-QA dataset, paired with a reproducible data processing pipeline (collection, filtering, augmentation, refinement) and dedicated benchmarks (MCQ, fill-in-the-blank, dialogue). They demonstrate state-of-the-art performance on diabetes-specific tasks and validate its utility through online patient consulting, medical education, and clinical record summarization studies, outperforming several open-source peers and approaching or exceeding proprietary systems in key areas. This work delivers a practical path to deploying domain-specific language capabilities in diabetes care, with implications for patient personalization, clinician education, and workflow efficiency.

Abstract

Diabetes is a chronic disease with a significant global health burden, requiring multi-stakeholder collaboration for optimal management. Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across diverse diabetes tasks remains unproven. Our study introduced a framework to train and validate diabetes-specific LLMs. We first developed a comprehensive data processing pipeline that includes data collection, filtering, augmentation and refinement. This created a high-quality, diabetes-specific dataset and evaluation benchmarks from scratch. Fine-tuned on the collected training dataset, our diabetes-specific LLM family demonstrated state-of-the-art proficiency in processing various diabetes tasks compared to other LLMs. Furthermore, clinical studies revealed the potential applications of our models in diabetes care, including providing personalized healthcare, assisting medical education, and streamlining clinical tasks. Generally, our introduced framework helps develop diabetes-specific LLMs and highlights their potential to enhance clinical practice and provide personalized, data-driven support for diabetes management across different end users. Our codes, benchmarks and models are available at https://github.com/waltonfuture/Diabetica.

Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management

TL;DR

Abstract

Paper Structure (39 sections, 3 equations, 11 figures, 7 tables)

This paper contains 39 sections, 3 equations, 11 figures, 7 tables.

Introduction
Diabetica
Data collection
Data filtering
Data augmentation
Data refinement
Preliminary.
Method.
Experiments
Experimental Setup
Model Training.
Baselines.
Benchmark assessment
Clinical evaluation
Overview
...and 24 more sections

Figures (11)

Figure 1: Overall study design. (a) Training data was collected from various sources. Data processing was then conducted to get the final diabetes-related, formatted, and high-quality dataset. (b) Fine-tuning was applied for developing Diabetica. (c) We compared the performance of different LLMs on MCQ benchmark, FB benchmark, and dialogue benchmark. (d) Our model was then evaluated in three clinical applications: medical consulting, examination education, and clinical record summarization.
Figure 2: Performances of different LLMs in diabetes-related benchmarks, including multiple-choice questions (left) and fill-in-the-blank questions (right). Diabetica achieves leading performances among these LLMs.
Figure 3: GPT-4 and Claude-3.5 judged scores of different LLMs in the dialogue benchmark.
Figure 4: Performance comparison of the AI-generated and doctor-delivered responses of online patient cases (n=20). Evaluation was based on the expert panel review including (a) readability, (b) relevance, (c) correctness, (d) completeness, (e) safety, (f) empathy, and (g) selected superior responses. Bar graphs indicate the mean $\pm$ s.e.m., ***P$<$0.001, calculated by paired-Wilcox test.
Figure 5: Performance on medical education. (a) Accuracy in answering A2-type multiple-choice questions of medical students, physicians with different levels, and LLMs in the MCQ examination. (c) Student evaluation of the helpfulness and readability of answer explanations from Diabetica and reference. (b) The readability and (d) helpfulness scores of answer explanations from Diabetica and reference. There is no significant difference (ns), calculated by the paired Wilcoxon test.
...and 6 more figures

Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management

TL;DR

Abstract

Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management

Authors

TL;DR

Abstract

Table of Contents

Figures (11)