Table of Contents
Fetching ...

AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs

Diwei Wang, Cédric Bobenrieth, Hyewon Seo

TL;DR

The paper tackles objective, interpretable gait impairment assessment in neurodegenerative diseases from video data. It introduces AGIR, a pipeline that pairs a pre-trained motion tokenizer with a fine-tuned large language model trained on motion tokens and chain-of-thought reasoning, aided by a two-stage supervised fine-tuning that aligns motions with analytic descriptions and performs CoT-based impairment assessment. A multimodal, rationales-enriched dataset plus knowledge-aware prompts and numerically embedded gait parameters enable robust cross-modal alignment and natural-language decoding of gait findings. On two tasks—gait scoring and dementia subtyping—AGIR outperforms state-of-the-art methods under limited data, delivering interpretable, rationale-backed impairment assessments and highlighting the potential of integrating patient metadata into video-based clinical decision support.

Abstract

Assessing gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases. Despite its widespread use in clinical practice, it is limited by subjectivity and a lack of precision. While recent deep learning-based approaches have consistently improved classification accuracies, they often lack interpretability, hindering their utility in clinical decision-making. To overcome these challenges, we introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a subsequent Large Language Model (LLM) fine-tuned over pairs of motion tokens and Chain-of-Thought (CoT) reasonings. To fine-tune an LLM for pathological gait analysis, we first introduce a multimodal dataset by adding rationales dedicated to MDS-UPDRS gait score assessment to an existing PD gait dataset. We then introduce a two-stage supervised fine-tuning (SFT) strategy to enhance the LLM's motion comprehension with pathology-specific knowledge. This strategy includes: 1) a generative stage that aligns gait motions with analytic descriptions through bidirectional motion-description generation, 2) a reasoning stage that integrates logical Chain-of-Thought (CoT) reasoning for impairment assessment with UPDRS gait score. Validation on an existing dataset and comparisons with state-of-the-art methods confirm the robustness and accuracy of our pipeline, demonstrating its ability to assign gait impairment scores from motion input with clinically meaningful rationales.

AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs

TL;DR

The paper tackles objective, interpretable gait impairment assessment in neurodegenerative diseases from video data. It introduces AGIR, a pipeline that pairs a pre-trained motion tokenizer with a fine-tuned large language model trained on motion tokens and chain-of-thought reasoning, aided by a two-stage supervised fine-tuning that aligns motions with analytic descriptions and performs CoT-based impairment assessment. A multimodal, rationales-enriched dataset plus knowledge-aware prompts and numerically embedded gait parameters enable robust cross-modal alignment and natural-language decoding of gait findings. On two tasks—gait scoring and dementia subtyping—AGIR outperforms state-of-the-art methods under limited data, delivering interpretable, rationale-backed impairment assessments and highlighting the potential of integrating patient metadata into video-based clinical decision support.

Abstract

Assessing gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases. Despite its widespread use in clinical practice, it is limited by subjectivity and a lack of precision. While recent deep learning-based approaches have consistently improved classification accuracies, they often lack interpretability, hindering their utility in clinical decision-making. To overcome these challenges, we introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a subsequent Large Language Model (LLM) fine-tuned over pairs of motion tokens and Chain-of-Thought (CoT) reasonings. To fine-tune an LLM for pathological gait analysis, we first introduce a multimodal dataset by adding rationales dedicated to MDS-UPDRS gait score assessment to an existing PD gait dataset. We then introduce a two-stage supervised fine-tuning (SFT) strategy to enhance the LLM's motion comprehension with pathology-specific knowledge. This strategy includes: 1) a generative stage that aligns gait motions with analytic descriptions through bidirectional motion-description generation, 2) a reasoning stage that integrates logical Chain-of-Thought (CoT) reasoning for impairment assessment with UPDRS gait score. Validation on an existing dataset and comparisons with state-of-the-art methods confirm the robustness and accuracy of our pipeline, demonstrating its ability to assign gait impairment scores from motion input with clinically meaningful rationales.

Paper Structure

This paper contains 10 sections, 7 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of our cross-modality model for video-based clinical gait analysis (left), alongside clinical gait notions and per-class descriptions of gait classes utilized for prompt initialization (right). Three colored blocks represent the text- and video encoding pipelines, and the text embedding of numerical gait parameters, respectively.
  • Figure 2: Translation of gait parameters into text.
  • Figure 3: Numerical text embedding process using the frozen CLIP text encoder.
  • Figure 4: Feature visualization using UMAP (no. components$=$3) for numerical text embeddings derived from gait parameters. Yellow points in (b) represent the projections of the learned per-class text features. Images rendered with www.polyscope.run.