Joint-Embedding Masked Autoencoder for Self-supervised Learning of Dynamic Functional Connectivity from the Human Brain
Jungwon Choi, Hyungi Lee, Byung-Hoon Kim, Juho Lee
TL;DR
This work tackles label-scarce learning for dynamic functional connectivity by pretraining a spatio-temporal masked autoencoder (ST-JEMA) on large unlabeled fMRI data. It adapts the Joint Embedding Predictive Architecture to graphs by reconstructing latent node and edge representations across space and time, using dual encoders with EMA updates and MLP-Mixer decoders. Across eight downstream rs-fMRI benchmarks, ST-JEMA consistently outperforms static and dynamic baselines on gender, age, and psychiatric diagnosis tasks, with particular strength in data-scarce clinical settings and in scenarios with temporal missing data. The approach demonstrates that leveraging high-level semantic reconstruction of dynamic graphs from unlabeled data yields robust, transfer-ready representations for neuroimaging phenotyping and diagnosis.
Abstract
Graph Neural Networks (GNNs) have shown promise in learning dynamic functional connectivity for distinguishing phenotypes from human brain networks. However, obtaining extensive labeled clinical data for training is often resource-intensive, making practical application difficult. Leveraging unlabeled data thus becomes crucial for representation learning in a label-scarce setting. Although generative self-supervised learning techniques, especially masked autoencoders, have shown promising results in representation learning in various domains, their application to dynamic graphs for dynamic functional connectivity remains underexplored, facing challenges in capturing high-level semantic representations. Here, we introduce the Spatio-Temporal Joint Embedding Masked Autoencoder (ST-JEMA), drawing inspiration from the Joint Embedding Predictive Architecture (JEPA) in computer vision. ST-JEMA employs a JEPA-inspired strategy for reconstructing dynamic graphs, which enables the learning of higher-level semantic representations considering temporal perspectives, addressing the challenges in fMRI data representation learning. Utilizing the large-scale UK Biobank dataset for self-supervised learning, ST-JEMA shows exceptional representation learning performance on dynamic functional connectivity demonstrating superiority over previous methods in predicting phenotypes and psychiatric diagnoses across eight benchmark fMRI datasets even with limited samples and effectiveness of temporal reconstruction on missing data scenarios. These findings highlight the potential of our approach as a robust representation learning method for leveraging label-scarce fMRI data.
