Across-subject ensemble-learning alleviates the need for large samples for fMRI decoding
Himanshu Aggarwal, Liza Al-Shikhley, Bertrand Thirion
TL;DR
Decoding cognitive states from fMRI is hindered by a high feature-to-sample ratio and poor cross-subject generalization. The authors propose an across-subject ensemble-learning (stacking) approach that pre-trains per-subject classifiers and trains a final meta-learner on their predictions to decode a new subject, evaluated across five datasets with both voxel-space and DiFuMo features and multiple classifiers. The ensemble method yields up to ~20% accuracy gains over conventional decoding, especially when per-subject data are scarce, with full-voxel pre-training and an MLP final classifier providing robust performance. These results demonstrate that cross-subject pre-training can significantly reduce the per-subject data requirements, supporting more efficient fMRI decoding for real-time BCI and cognitive research, while suggesting avenues for deeper pre-training and functional alignment to further enhance cross-subject transfer.
Abstract
Decoding cognitive states from functional magnetic resonance imaging is central to understanding the functional organization of the brain. Within-subject decoding avoids between-subject correspondence problems but requires large sample sizes to make accurate predictions; obtaining such large sample sizes is both challenging and expensive. Here, we investigate an ensemble approach to decoding that combines the classifiers trained on data from other subjects to decode cognitive states in a new subject. We compare it with the conventional decoding approach on five different datasets and cognitive tasks. We find that it outperforms the conventional approach by up to 20% in accuracy, especially for datasets with limited per-subject data. The ensemble approach is particularly advantageous when the classifier is trained in voxel space. Furthermore, a Multi-layer Perceptron turns out to be a good default choice as an ensemble method. These results show that the pre-training strategy reduces the need for large per-subject data.
