Table of Contents
Fetching ...

Cross-Modal Fine-Tuning of 3D Convolutional Foundation Models for ADHD Classification with Low-Rank Adaptation

Jyun-Ping Kao, Shinyeong Rho, Shahar Lazarev, Hyun-Hae Cho, Fangxu Xing, Taehoon Shin, C. -C. Jay Kuo, Jonghye Woo

TL;DR

The paper tackles ADHD classification from diffusion MRI and demonstrates cross-modal transfer by adapting a CT-pretrained 3D convolutional backbone using Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. By freezing the backbone and training LoRA adapters across all 3D layers along with a small MLP head, the method achieves state-of-the-art performance on a diffusion MRI ADHD dataset while updating only $1.64$ million parameters (about 113× fewer than full fine-tuning). On five-fold cross-validation, it yields ACC of $0.719$ and AUC of $0.716$, validating the feasibility of cross-modal learning from CT to MRI in neuroimaging. The work highlights the potential of modality-agnostic representations for efficient clinical transfer learning, though broader multi-site validation is needed to confirm generalizability.

Abstract

Early diagnosis of attention-deficit/hyperactivity disorder (ADHD) in children plays a crucial role in improving outcomes in education and mental health. Diagnosing ADHD using neuroimaging data, however, remains challenging due to heterogeneous presentations and overlapping symptoms with other conditions. To address this, we propose a novel parameter-efficient transfer learning approach that adapts a large-scale 3D convolutional foundation model, pre-trained on CT images, to an MRI-based ADHD classification task. Our method introduces Low-Rank Adaptation (LoRA) in 3D by factorizing 3D convolutional kernels into 2D low-rank updates, dramatically reducing trainable parameters while achieving superior performance. In a five-fold cross-validated evaluation on a public diffusion MRI database, our 3D LoRA fine-tuning strategy achieved state-of-the-art results, with one model variant reaching 71.9% accuracy and another attaining an AUC of 0.716. Both variants use only 1.64 million trainable parameters (over 113x fewer than a fully fine-tuned foundation model). Our results represent one of the first successful cross-modal (CT-to-MRI) adaptations of a foundation model in neuroimaging, establishing a new benchmark for ADHD classification while greatly improving efficiency.

Cross-Modal Fine-Tuning of 3D Convolutional Foundation Models for ADHD Classification with Low-Rank Adaptation

TL;DR

The paper tackles ADHD classification from diffusion MRI and demonstrates cross-modal transfer by adapting a CT-pretrained 3D convolutional backbone using Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. By freezing the backbone and training LoRA adapters across all 3D layers along with a small MLP head, the method achieves state-of-the-art performance on a diffusion MRI ADHD dataset while updating only million parameters (about 113× fewer than full fine-tuning). On five-fold cross-validation, it yields ACC of and AUC of , validating the feasibility of cross-modal learning from CT to MRI in neuroimaging. The work highlights the potential of modality-agnostic representations for efficient clinical transfer learning, though broader multi-site validation is needed to confirm generalizability.

Abstract

Early diagnosis of attention-deficit/hyperactivity disorder (ADHD) in children plays a crucial role in improving outcomes in education and mental health. Diagnosing ADHD using neuroimaging data, however, remains challenging due to heterogeneous presentations and overlapping symptoms with other conditions. To address this, we propose a novel parameter-efficient transfer learning approach that adapts a large-scale 3D convolutional foundation model, pre-trained on CT images, to an MRI-based ADHD classification task. Our method introduces Low-Rank Adaptation (LoRA) in 3D by factorizing 3D convolutional kernels into 2D low-rank updates, dramatically reducing trainable parameters while achieving superior performance. In a five-fold cross-validated evaluation on a public diffusion MRI database, our 3D LoRA fine-tuning strategy achieved state-of-the-art results, with one model variant reaching 71.9% accuracy and another attaining an AUC of 0.716. Both variants use only 1.64 million trainable parameters (over 113x fewer than a fully fine-tuned foundation model). Our results represent one of the first successful cross-modal (CT-to-MRI) adaptations of a foundation model in neuroimaging, establishing a new benchmark for ADHD classification while greatly improving efficiency.

Paper Structure

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the proposed ADHD classification pipeline. Input: A 3D tensor with two channels (FA and MD). Backbone: The input is processed by a pre-trained 3D FM pai2024foundation. Multilayer Perceptron (MLP): Performs the final binary classification for ADHD and Healthy Volunteer (HV).
  • Figure 2: Illustration of the proposed LoRA applied within a bottleneck residual block. Left: The standard residual block architecture. Our trainable LoRA modules are injected in parallel to each 3D convolutional layer. Right: A detailed view of a LoRA module.