Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

Zhaolong Wu; Pu Luo; Nan Meng; Jason Pui Yin Cheung; Teng Zhang

Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

Zhaolong Wu, Pu Luo, Nan Meng, Jason Pui Yin Cheung, Teng Zhang

TL;DR

Adolescent Idiopathic Scoliosis (AIS) self-management requires reliable multimodal interpretation beyond imaging. This paper introduces a Divide and Conquer framework that decomposes AIS care into Visual Spinal Assessment, Domain Knowledge Assessment, and Patient Education Counseling, augmented by spinal keypoint prompting and retrieval-augmented generation (RAG). It provides a large-scale AP spine X-ray dataset (~3,683 images from 3,022 patients) and AIS-specific knowledge resources to systematically evaluate MLLMs, reporting model- and task-dependent gains from visual prompts and substantial knowledge gains from RAG. The findings show current MLLMs fall short of personalized AIS care, but targeted adaptations offer a concrete pathway to improved, patient-facing AIS support.

Abstract

This study presents the first comprehensive evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management. We constructed a database of approximately 3,000 anteroposterior X-rays with diagnostic texts and evaluated five MLLMs through a `Divide and Conquer' framework consisting of a visual question-answering task, a domain knowledge assessment task, and a patient education counseling assessment task. Our investigation revealed limitations of MLLMs' ability in interpreting complex spinal radiographs and comprehending AIS care knowledge. To address these, we pioneered enhancing MLLMs with spinal keypoint prompting and compiled an AIS knowledge base for retrieval augmented generation (RAG), respectively. Results showed varying effectiveness of visual prompting across different architectures, while RAG substantially improved models' performances on the knowledge assessment task. Our findings indicate current MLLMs are far from capable in realizing personalized assistant in AIS care. The greatest challenge lies in their abilities to obtain accurate detections of spinal deformity locations (best accuracy: 0.55) and directions (best accuracy: 0.13).

Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

TL;DR

Abstract

Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)