The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark
Sylvain Chevallier, Igor Carrara, Bruno Aristimunha, Pierre Guetschel, Sara Sedlar, Bruna Lopes, Sebastien Velut, Salim Khazem, Thomas Moreau
TL;DR
The paper tackles the reproducibility gap in EEG-based BCIs by conducting the largest open, reproducible benchmark across 36 datasets and 30 pipelines (MI, P300, SSVEP) within a unified MOABB framework. It shows that Riemannian geometry-based classifiers—especially tangent-space variants—consistently outperform Raw and Deep Learning pipelines, while deep learning requires substantial trial counts for competitive performance. The study also integrates environmental impact assessment via Code Carbon and provides a transparent, open-access platform for ongoing benchmarking and cross-dataset comparisons. Collectively, these contributions advance rigor, transparency, and scalability in BCI research, enabling robust cross-study comparisons and guiding practical experimental design.
Abstract
Objective. This study conduct an extensive Brain-computer interfaces (BCI) reproducibility analysis on open electroencephalography datasets, aiming to assess existing solutions and establish open and reproducible benchmarks for effective comparison within the field. The need for such benchmark lies in the rapid industrial progress that has given rise to undisclosed proprietary solutions. Furthermore, the scientific literature is dense, often featuring challenging-to-reproduce evaluations, making comparisons between existing approaches arduous. Approach. Within an open framework, 30 machine learning pipelines (separated into raw signal: 11, Riemannian: 13, deep learning: 6) are meticulously re-implemented and evaluated across 36 publicly available datasets, including motor imagery (14), P300 (15), and SSVEP (7). The analysis incorporates statistical meta-analysis techniques for results assessment, encompassing execution time and environmental impact considerations. Main results. The study yields principled and robust results applicable to various BCI paradigms, emphasizing motor imagery, P300, and SSVEP. Notably, Riemannian approaches utilizing spatial covariance matrices exhibit superior performance, underscoring the necessity for significant data volumes to achieve competitive outcomes with deep learning techniques. The comprehensive results are openly accessible, paving the way for future research to further enhance reproducibility in the BCI domain. Significance. The significance of this study lies in its contribution to establishing a rigorous and transparent benchmark for BCI research, offering insights into optimal methodologies and highlighting the importance of reproducibility in driving advancements within the field.
