D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention

Weiguo Chen; Changjian Wang; Kele Xu; Yuan Yuan; Yanru Bai; Dongsong Zhang

D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention

Weiguo Chen, Changjian Wang, Kele Xu, Yuan Yuan, Yanru Bai, Dongsong Zhang

TL;DR

D-FaST introduces a disentangled frequency-spatial-temporal attention framework for EEG-based cognitive signal decoding, addressing limitations of single- and serial-domain approaches. It deploys three dedicated modules—Multi-View Attention for frequency, Dynamic Connectogram Attention for spatial, and Local Temporal Sliding Attention for temporal features—and fuses them via a parallel, disentangled pathway to create robust, global representations. The method achieves state-of-the-art results on MNRED and strong generalization on ZuCo and BCIC IV-2A/2B, with extensive ablations and hyperparameter analyses supporting the contributions of each component. The work provides both practical advances for CLP/CSD and methodological guidance for multi-domain EEG decoding, including a new Mandarin reading EEG dataset and visualization insights into frequency and connectivity dynamics.

Abstract

Cognitive Language Processing (CLP), situated at the intersection of Natural Language Processing (NLP) and cognitive science, plays a progressively pivotal role in the domains of artificial intelligence, cognitive intelligence, and brain science. Among the essential areas of investigation in CLP, Cognitive Signal Decoding (CSD) has made remarkable achievements, yet there still exist challenges related to insufficient global dynamic representation capability and deficiencies in multi-domain feature integration. In this paper, we introduce a novel paradigm for CLP referred to as Disentangled Frequency-Spatial-Temporal Attention(D-FaST). Specifically, we present an novel cognitive signal decoder that operates on disentangled frequency-space-time domain attention. This decoder encompasses three key components: frequency domain feature extraction employing multi-view attention, spatial domain feature extraction utilizing dynamic brain connection graph attention, and temporal feature extraction relying on local time sliding window attention. These components are integrated within a novel disentangled framework. Additionally, to encourage advancements in this field, we have created a new CLP dataset, MNRED. Subsequently, we conducted an extensive series of experiments, evaluating D-FaST's performance on MNRED, as well as on publicly available datasets including ZuCo, BCIC IV-2A, and BCIC IV-2B. Our experimental results demonstrate that D-FaST outperforms existing methods significantly on both our datasets and traditional CSD datasets including establishing a state-of-the-art accuracy score 78.72% on MNRED, pushing the accuracy score on ZuCo to 78.35%, accuracy score on BCIC IV-2A to 74.85% and accuracy score on BCIC IV-2B to 76.81%.

D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention

TL;DR

Abstract

Paper Structure (39 sections, 6 equations, 9 figures, 16 tables, 1 algorithm)

This paper contains 39 sections, 6 equations, 9 figures, 16 tables, 1 algorithm.

Introduction
Related work
Cognitive Language Processing
Frequency feature extraction
Spatial feature extraction
Temporal feature extraction
Multidomain feature fusion
Methodology
Problem Definition
Overview of D-FaST
Frequency-Spatial-Temporal Attention
Multi-View Attention (MVA) for Frequency Feature Extraction
Dynamic Connectogram Attention (DCA) for Spatial Feature Extraction
Local Temporal Sliding Attention (LTSA)
Disentangled Frequency-Spatial Feature Extraction
...and 24 more sections

Figures (9)

Figure 1: Conceptual comparison of four brain signal decoding architectures. (a): The Single-Domain (1D) Architecture primarily focuses on the extraction of spatial domain information from cognitive signals. (b): The Double-Domain (2D) Serial Architecture predominantly extracts both spatial and temporal domain information, either in different orders or simultaneously. (c): The Triple-Domain (3D) Serial Architecture sequentially extracts information from the frequency domain, spatial domain, and temporal domain. (d): The Triple-Domain Disentangled Architecture initially processes cognitive signals through the frequency and spatial domains, resulting in separate frequency and spatial features.
Figure 2: t-SNE projections of feature extracted by EEGNet R1 with different strategies: (a) Serial Framework(Vanilla), (b) Disentangled Framework(Ours). The dashed circles indicate the range of projected features. The visualization details can be found in our open source code.
Figure 3: The overarching architecture of D-FaST. The dashed boxes delineate detailed descriptions of the corresponding modules. The three diagrams on the left provide a comprehensive breakdown of the neural networks within the MVA, DCA, and LSTA modules. The rightmost section illustrates the interconnections between these three modules.
Figure 4: Frequency domain information coding process of multi-view attention.
Figure 5: Dynamic connectogram and dynamic connection matrix of each window.
...and 4 more figures

D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention

TL;DR

Abstract

D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (9)