Table of Contents
Fetching ...

RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

Junhao Jia, Yifei Sun, Yunyou Liu, Cheng Yang, Changmiao Wang, Feiwei Qin, Yong Peng, Wenwen Min

TL;DR

RTGMFF is introduced, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis and surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve.

Abstract

Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at https://github.com/BeistMedAI/RTGMFF.

RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

TL;DR

RTGMFF is introduced, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis and surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve.

Abstract

Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at https://github.com/BeistMedAI/RTGMFF.

Paper Structure

This paper contains 16 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The overview of our proposed RTGMFF pipeline. Panel (a) illustrates the ROI-driven fMRI Text Generation. Panel (b) is a Hierarchical Wavelet-Mamba architecture, and panel (c) illustrates the Cross-Scale Transformer Encoder, with the details of the Mamba Block and Cross Attention in panels (e) and (f). The network structure of the Adaptive Semantic Alignment Module is given in Panel (d).
  • Figure 2: Left, a deterministic Jinja2 template that converts subject demographics and ROI activation triplets into radiology-style prose; right, the resulting report sentence for a 14-year-old boy showing varied regional activations and de-activations.
  • Figure 3: Plots of model performance on test set versus settings of hyperparameters $\alpha$ and $\beta$.
  • Figure 4: Heatmap of macro ROI-F1 over $\tau_{1}$ (y-axis) and $\tau_{2}$ (x-axis). The white star marks the optimal $(0.15,\,0.30)$; the blank triangle is invalid due to $\tau_{2} \le \tau_{1} + 0.02$.
  • Figure 5: Qualitative results of ROI-driven fMRI text generation. (Top) Cortical surface maps with three activation levels (light, medium, and dark red denote weak, moderate, and strong activity). (Bottom) Automatically generated clauses for the left (L) and right (R) hemispheres, illustrating the fidelity and interpretability of the proposed RFTG module.