Concept-based explainability for an EEG transformer model

Anders Gjølbye; William Lehn-Schiøler; Áshildur Jónsdóttir; Bergdís Arnardóttir; Lars Kai Hansen

Concept-based explainability for an EEG transformer model

Anders Gjølbye, William Lehn-Schiøler, Áshildur Jónsdóttir, Bergdís Arnardóttir, Lars Kai Hansen

TL;DR

The paper addresses the explainability of EEG transformer models by reframing internal representations in human terms through TCAV. It introduces two concept-formation workflows—human-labeled EEG concepts and anatomically grounded resting-state concepts—and validates them with sanity checks and two practical applications: seizure prediction and brain-computer interfacing. By integrating BENDR-based representations with TCAV and employing source localization via eLORETA, the work demonstrates that concept-based explanations align with neurologically meaningful patterns and can inform model design and diagnostic support. This approach offers a principled, data-driven pathway to interpret complex EEG models, with potential impact on clinical decision support and neurotechnology development.

Abstract

Deep learning models are complex due to their size, structure, and inherent randomness in training procedures. Additional complexity arises from the selection of datasets and inductive biases. Addressing these challenges for explainability, Kim et al. (2018) introduced Concept Activation Vectors (CAVs), which aim to understand deep models' internal states in terms of human-aligned concepts. These concepts correspond to directions in latent space, identified using linear discriminants. Although this method was first applied to image classification, it was later adapted to other domains, including natural language processing. In this work, we attempt to apply the method to electroencephalogram (EEG) data for explainability in Kostas et al.'s BENDR (2021), a large-scale transformer model. A crucial part of this endeavor involves defining the explanatory concepts and selecting relevant datasets to ground concepts in the latent space. Our focus is on two mechanisms for EEG concept formation: the use of externally labeled EEG datasets, and the application of anatomically defined concepts. The former approach is a straightforward generalization of methods used in image classification, while the latter is novel and specific to EEG. We present evidence that both approaches to concept formation yield valuable insights into the representations learned by deep EEG models.

Concept-based explainability for an EEG transformer model

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 5 figures)

This paper contains 16 sections, 3 equations, 5 figures.

Introduction
Theory
BERT-inspired Neural Data Representations
Linear Head BENDR
Testing with Concept Activation Vectors (TCAV)
Source localization
Methods
Data
Training
Constructing Concepts
Experiments
Results
Sanity Checks
Event-based concepts
Anatomy/Frequency-Based Concepts
...and 1 more sections

Figures (5)

Figure 1: An overview of using the TCAV method for EEG classification tasks with the Linear Head BENDR model: (1) Explanatory concepts are defined as either event-based EEG labels or frequency-based cortical activity, (2) Layer activations are extracted from a fine-tuned Linear Head BENDR, (3) Concept Activation Vectors (CAV) are defined as the normal vector to the hyperplane separating layer activations for concept data from those of random examples, and (4) The sensitivity of class data for a specific bottleneck of a concept is defined as the directional derivative in the direction of the respective CAV.
Figure 2: The Linear Head BENDR (LHB) model architecture illustrated. The model consists of (1) Feature encoder of six confrontational blocks, (2) Encoding augment comprised of masking and convolutional contextualizer, (3) Summarizer using Adaptive Average Pooling, (4) Extended Classifier for dimensionality reduction, and (5) Classifier.
Figure 3: Sanity checks for applying the TCAV method to EEG data and the bottlenecks of the LHB model. The figure presents the results of TCAV for the Left Fist Movement class in a binary classification task using the MMIDB EEG dataset. From right to left, concepts are defined as follows: (1) Left Fist Movement and (2) Right Fist Movement class data, maximal mean activity in the alpha frequency band for (3) Left Hemisphere and (4) Right Hemisphere, respectively, and (5) Eye Movement artifacts. Stars indicate either positive (a score above 0.5) or negative (a score below 0.5) statistical significance.
Figure 4: The results of utilizing TCAV to assess whether event-based EEG labels align with the internal representation of the seizure class data in the LHB model at the five bottlenecks are presented. From the right, the concepts are defined as (1) technical artifacts (artf), (2) background (bckg), (3) generalized periodic epileptic discharge (gped), (4) periodic lateralized epileptic discharge (pled), and (5) spike and short wave (spsw). Stars indicate either positive (a score above 0.5) or negative (a score below 0.5) statistical significance.
Figure 5: Using TCAV, we analyzed the alignment between anatomical concepts in the alpha band and the internal representation of the Left Fist Movement class in the LHB model at five bottlenecks. The visualization of five pairs of concepts focused on five cortical areas, located in both the left and right hemispheres, that were deemed most relevant for the classification task. The chosen concepts had a higher deviation in the alpha band. Stars indicate either positive (a score above 0.5) or negative (a score below 0.5) statistical significance. Our analysis reveals significant lateralization in the Somatosensory and Motor Cortex across all five bottlenecks. Additionally, we observe that the Primary Visual Cortex (V1) was insignificant for both hemispheres in all bottlenecks.

Concept-based explainability for an EEG transformer model

TL;DR

Abstract

Concept-based explainability for an EEG transformer model

Authors

TL;DR

Abstract

Table of Contents

Figures (5)