Table of Contents
Fetching ...

EEGUnity: Open-Source Tool in Facilitating Unified EEG Datasets Towards Large-Scale EEG Model

Chengxuan Qin, Rui Yang, Wenlong You, Zhige Chen, Longsheng Zhu, Mengjie Huang, Zidong Wang

TL;DR

EEGUnity, an open-source tool that incorporates modules of "EEG Parser", "Correction", "Batch Processing", and "Large Language Model Boost", facilitates the efficient management of multiple EEG datasets, such as intelligent data structure inference, data cleaning, and data unification.

Abstract

The increasing number of dispersed EEG dataset publications and the advancement of large-scale Electroencephalogram (EEG) models have increased the demand for practical tools to manage diverse EEG datasets. However, the inherent complexity of EEG data, characterized by variability in content data, metadata, and data formats, poses challenges for integrating multiple datasets and conducting large-scale EEG model research. To tackle the challenges, this paper introduces EEGUnity, an open-source tool that incorporates modules of 'EEG Parser', 'Correction', 'Batch Processing', and 'Large Language Model Boost'. Leveraging the functionality of such modules, EEGUnity facilitates the efficient management of multiple EEG datasets, such as intelligent data structure inference, data cleaning, and data unification. In addition, the capabilities of EEGUnity ensure high data quality and consistency, providing a reliable foundation for large-scale EEG data research. EEGUnity is evaluated across 25 EEG datasets from different sources, offering several typical batch processing workflows. The results demonstrate the high performance and flexibility of EEGUnity in parsing and data processing. The project code is publicly available at github.com/Baizhige/EEGUnity.

EEGUnity: Open-Source Tool in Facilitating Unified EEG Datasets Towards Large-Scale EEG Model

TL;DR

EEGUnity, an open-source tool that incorporates modules of "EEG Parser", "Correction", "Batch Processing", and "Large Language Model Boost", facilitates the efficient management of multiple EEG datasets, such as intelligent data structure inference, data cleaning, and data unification.

Abstract

The increasing number of dispersed EEG dataset publications and the advancement of large-scale Electroencephalogram (EEG) models have increased the demand for practical tools to manage diverse EEG datasets. However, the inherent complexity of EEG data, characterized by variability in content data, metadata, and data formats, poses challenges for integrating multiple datasets and conducting large-scale EEG model research. To tackle the challenges, this paper introduces EEGUnity, an open-source tool that incorporates modules of 'EEG Parser', 'Correction', 'Batch Processing', and 'Large Language Model Boost'. Leveraging the functionality of such modules, EEGUnity facilitates the efficient management of multiple EEG datasets, such as intelligent data structure inference, data cleaning, and data unification. In addition, the capabilities of EEGUnity ensure high data quality and consistency, providing a reliable foundation for large-scale EEG data research. EEGUnity is evaluated across 25 EEG datasets from different sources, offering several typical batch processing workflows. The results demonstrate the high performance and flexibility of EEGUnity in parsing and data processing. The project code is publicly available at github.com/Baizhige/EEGUnity.

Paper Structure

This paper contains 12 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Objective of Proposed EEGUnity.
  • Figure 2: Three Approaches for Managing Datasets in EEGUnity. "U$\sim$Dataset" refers to UnifiedDataset.
  • Figure 3: Structure of the UnifiedDataset.
  • Figure 4: Frequency Visualization Results of Correction Module. The figure displays magnitude-frequency curves across four subfigures, with each subfigure corresponding to one of the frequency bands: alpha, beta, theta, and gamma. The samples visualized are randomly selected within a domain specified by a "domain tag". The average curve for each band is represented in blue, while individual curves for each data sample are depicted in grey.
  • Figure 5: Channel Correlation Visualization Results of Correction Module. The figure presents channel correlation for samples that are randomly selected within a domain identified by a "domain tag". The number of samples to be visualized is adjustable via a parameter. Subfigure placement is automatically optimized for ease of review. This figure allows users to inspect specific aspects of the samples, such as low-frequency noise or channel noise, facilitating a detailed analysis FilteringU0_2023_yan.
  • ...and 1 more figures