Table of Contents
Fetching ...

Toward the application of XAI methods in EEG-based systems

Andrea Apicella, Francesco Isgrò, Andrea Pollastro, Roberto Prevete

TL;DR

The paper tackles dataset shift in EEG-based BCIs caused by non-stationarity across sessions by evaluating multiple XAI methods (Saliency, Guided BackPropagation, Layer-wise Relevance Propagation, Integrated Gradients, DeepLIFT) on EEG emotion recognition using the SEED dataset. It analyzes how explanations identify input components that drive classification and tests their transferability across sessions through MoRF, AOPC, LeRF, and ABPC metrics, applied to features, bands, and channels. Key findings show that LRP, IG, and DeepLIFT provide more reliable explanations than Saliency or Guided BackPropagation, with inter-session explanations often more robust, though no single method yields universally generalizable components across all samples. The work demonstrates a promising step toward XAI-informed feature selection to improve cross-session generalization in EEG BCIs and informs future directions for inter-subject generalization and improved EEG acquisition strategies.

Abstract

An interesting case of the well-known Dataset Shift Problem is the classification of Electroencephalogram (EEG) signals in the context of Brain-Computer Interface (BCI). The non-stationarity of EEG signals can lead to poor generalisation performance in BCI classification systems used in different sessions, also from the same subject. In this paper, we start from the hypothesis that the Dataset Shift problem can be alleviated by exploiting suitable eXplainable Artificial Intelligence (XAI) methods to locate and transform the relevant characteristics of the input for the goal of classification. In particular, we focus on an experimental analysis of explanations produced by several XAI methods on an ML system trained on a typical EEG dataset for emotion recognition. Results show that many relevant components found by XAI methods are shared across the sessions and can be used to build a system able to generalise better. However, relevant components of the input signal also appear to be highly dependent on the input itself.

Toward the application of XAI methods in EEG-based systems

TL;DR

The paper tackles dataset shift in EEG-based BCIs caused by non-stationarity across sessions by evaluating multiple XAI methods (Saliency, Guided BackPropagation, Layer-wise Relevance Propagation, Integrated Gradients, DeepLIFT) on EEG emotion recognition using the SEED dataset. It analyzes how explanations identify input components that drive classification and tests their transferability across sessions through MoRF, AOPC, LeRF, and ABPC metrics, applied to features, bands, and channels. Key findings show that LRP, IG, and DeepLIFT provide more reliable explanations than Saliency or Guided BackPropagation, with inter-session explanations often more robust, though no single method yields universally generalizable components across all samples. The work demonstrates a promising step toward XAI-informed feature selection to improve cross-session generalization in EEG BCIs and informs future directions for inter-subject generalization and improved EEG acquisition strategies.

Abstract

An interesting case of the well-known Dataset Shift Problem is the classification of Electroencephalogram (EEG) signals in the context of Brain-Computer Interface (BCI). The non-stationarity of EEG signals can lead to poor generalisation performance in BCI classification systems used in different sessions, also from the same subject. In this paper, we start from the hypothesis that the Dataset Shift problem can be alleviated by exploiting suitable eXplainable Artificial Intelligence (XAI) methods to locate and transform the relevant characteristics of the input for the goal of classification. In particular, we focus on an experimental analysis of explanations produced by several XAI methods on an ML system trained on a typical EEG dataset for emotion recognition. Results show that many relevant components found by XAI methods are shared across the sessions and can be used to build a system able to generalise better. However, relevant components of the input signal also appear to be highly dependent on the input itself.
Paper Structure (14 sections, 3 equations, 5 figures)

This paper contains 14 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: A general functional scheme of a Machine Learning (ML) architecture based on XAI methods to select and transform relevant input features with the aim of improving the performance of ML systems in the context of the dataset-shift problem.
  • Figure 2: MoRF (first column), AOPC (second column), LeRF (third column), and ABPC (fourth column) curves using the tested XAI methods are reported for both intra-session (solid line) and inter-session (dotted lines) considering features as signal components. Results scoring the input components using effective relevance (blue lines) and averaged relevance computed on training data (orange lines) are reported for each case and compared with a random component scoring (green lines). On the $x$ axis and $y$ axis are reported the iteration step in the curve generation and the accuracy level reached, respectively.
  • Figure 3: MoRF (first column), AOPC (second column), LeRF (third column), and ABPC (fourth column) curves using the tested XAI methods are reported for both intra-session (solid line) and inter-session (dotted lines) considering delta, theta, alpha, beta, gamma EEG bands as signal components. Results scoring the input components using effective relevance (blue lines) and averaged relevance computed on training data (orange lines) are reported for each case and compared with a random component scoring (green lines). On the $x$ axis and $y$ axis are reported the iteration step in the curve generation and the accuracy level reached, respectively.
  • Figure 4: MoRF (first column), AOPC (second column), LeRF (third column), and ABPC (fourth column) curves using the tested XAI methods are reported for both intra-session (solid line) and inter-session (dotted lines) considering the acquisition electrodes as signal components. Results scoring the input components using effective relevance (blue lines) and averaged relevance computed on training data (orange lines) are reported for each case and compared with a random component scoring (green lines). On the $x$ axis and $y$ axis are reported the iteration step in the curve generation and the accuracy level reached, respectively.
  • Figure 5: A first analysis of the discriminative power of the components alone. Signals composed of only one component following the relevance order given by the Explainer are fed to the ML system in an iterative manner. Results are reported for both intra-session (solid line) and inter-session (dotted lines) considering features (first column), bands (second column), and electrodes (third column) as signal components. Results scoring the input components using effective relevance (blue lines) and averaged relevance computed on training data (orange lines) are reported for each case and compared with a random component scoring (green lines).