NeuroClean: A Generalized Machine-Learning Approach to Neural Time-Series Conditioning

Manuel A. Hernandez Alonso; Michael Depass; Stephan Quessy; Numa Dancause; Ignasi Cos

NeuroClean: A Generalized Machine-Learning Approach to Neural Time-Series Conditioning

Manuel A. Hernandez Alonso, Michael Depass, Stephan Quessy, Numa Dancause, Ignasi Cos

TL;DR

This work tackles the challenge of automated, reproducible preprocessing of EEG/LFP time‑series by introducing NeuroClean, an unsupervised five‑step pipeline (bandpass filtering, ZapLine line noise removal, bad‑channel rejection, ICA with Cluster‑MARA, and optional epoching). The pipeline is designed to generalize across diverse experimental setups and to reduce human bias, with validation on high‑dimensional macaque LFP motor data showing substantial improvements in downstream classification metrics and spectral fidelity ($1/f$‑like brain activity). Central to NeuroClean is the Cluster‑MARA method, which uses DBSCAN clustering on MARA features to automatically reject artifactual components without spatial metadata, contributing to robust performance gains. The results suggest NeuroClean as a reproducible, scalable preprocessing foundation for neuroscience and brain‑computer interface research, with potential for broader adoption and further validation across datasets.

Abstract

Electroencephalography (EEG) and local field potentials (LFP) are two widely used techniques to record electrical activity from the brain. These signals are used in both the clinical and research domains for multiple applications. However, most brain data recordings suffer from a myriad of artifacts and noise sources other than the brain itself. Thus, a major requirement for their use is proper and, given current volumes of data, a fully automatized conditioning. As a means to this end, here we introduce an unsupervised, multipurpose EEG/LFP preprocessing method, the NeuroClean pipeline. In addition to its completeness and reliability, NeuroClean is an unsupervised series of algorithms intended to mitigate reproducibility issues and biases caused by human intervention. The pipeline is designed as a five-step process, including the common bandpass and line noise filtering, and bad channel rejection. However, it incorporates an efficient independent component analysis with an automatic component rejection based on a clustering algorithm. This machine learning classifier is used to ensure that task-relevant information is preserved after each step of the cleaning process. We used several data sets to validate the pipeline. NeuroClean removed several common types of artifacts from the signal. Moreover, in the context of motor tasks of varying complexity, it yielded more than 97% accuracy (vs. a chance-level of 33.3%) in an optimized Multinomial Logistic Regression model after cleaning the data, compared to the raw data, which performed at 74% accuracy. These results show that NeuroClean is a promising pipeline and workflow that can be applied to future work and studies to achieve better generalization and performance on machine learning pipelines.

NeuroClean: A Generalized Machine-Learning Approach to Neural Time-Series Conditioning

TL;DR

Abstract

NeuroClean: A Generalized Machine-Learning Approach to Neural Time-Series Conditioning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)