Table of Contents
Fetching ...

Explainable Interictal Epileptiform Discharge Detection Method Based on Scalp EEG and Retrieval-Augmented Generation

Yu Zhu, Jiayang Guo, Jun Jiang, Peipei Gu, Xin Shu, Duo Chen

TL;DR

This study proposes IED-RAG, an explainable multimodal framework for joint IED detection and report generation that employs a dual-encoder to extract electrophysiological and semantic features, aligned via contrastive learning in a shared EEG-text embedding space.

Abstract

The detection of interictal epileptiform discharge (IED) is crucial for the diagnosis of epilepsy, but automated methods often lack interpretability. This study proposes IED-RAG, an explainable multimodal framework for joint IED detection and report generation. Our approach employs a dual-encoder to extract electrophysiological and semantic features, aligned via contrastive learning in a shared EEG-text embedding space. During inference, clinically relevant EEG-text pairs are retrieved from a vector database as explicit evidence to condition a large language model (LLM) for the generation of evidence-based reports. Evaluated on a private dataset from Wuhan Children's Hospital and the public TUH EEG Events Corpus (TUEV), the framework achieved balanced accuracies of 89.17\% and 71.38\%, with BLEU scores of 89.61\% and 64.14\%, respectively. The results demonstrate that retrieval of explicit evidence enhances both diagnostic performance and clinical interpretability compared to standard black-box methods.

Explainable Interictal Epileptiform Discharge Detection Method Based on Scalp EEG and Retrieval-Augmented Generation

TL;DR

This study proposes IED-RAG, an explainable multimodal framework for joint IED detection and report generation that employs a dual-encoder to extract electrophysiological and semantic features, aligned via contrastive learning in a shared EEG-text embedding space.

Abstract

The detection of interictal epileptiform discharge (IED) is crucial for the diagnosis of epilepsy, but automated methods often lack interpretability. This study proposes IED-RAG, an explainable multimodal framework for joint IED detection and report generation. Our approach employs a dual-encoder to extract electrophysiological and semantic features, aligned via contrastive learning in a shared EEG-text embedding space. During inference, clinically relevant EEG-text pairs are retrieved from a vector database as explicit evidence to condition a large language model (LLM) for the generation of evidence-based reports. Evaluated on a private dataset from Wuhan Children's Hospital and the public TUH EEG Events Corpus (TUEV), the framework achieved balanced accuracies of 89.17\% and 71.38\%, with BLEU scores of 89.61\% and 64.14\%, respectively. The results demonstrate that retrieval of explicit evidence enhances both diagnostic performance and clinical interpretability compared to standard black-box methods.
Paper Structure (32 sections, 8 equations, 7 figures, 6 tables)

This paper contains 32 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the Proposed IED Detection Framework. The workflow illustrates the evidence-grounded analysis process: A query EEG segment $X_q$ is input into the system. The core IED-RAG framework retrieves clinically relevant evidence from an external knowledge base to ground the inference. The system produces dual outputs: a binary detection decision $\hat{y}$ and an interpretable clinical report $T$.
  • Figure 2: Schematic overview of the unified EEG data preprocessing pipeline. The workflow follows a sequential progression across four stages: (1) Data Acquisition: Raw multichannel EEG signals containing physiological noise and baseline drift. (2) Signal Cleaning: A composite stage integrating band-pass filtering (0.5--50 Hz) and Independent Component Analysis (ICA). ICA decomposes signals into source components, allowing for the specific rejection of ocular and myogenic artifacts while preserving neural activity. (3) Dense Segmentation: The cleaned continuous EEG is partitioned using a sliding-window strategy with high temporal overlap to maximize data utilization. (4) Standardization: The final output consists of aligned, fixed-length tensors ready for model ingestion.
  • Figure 3: Overview of the proposed explainable multimodal RAG framework. The architecture follows a two-phase design: (A) Indexing Phase (top) and (B) Inference Phase (bottom). (A) Indexing: Paired training EEG segments and expert-authored clinical reports are encoded by a dual-encoder model, where a Deep4Net-based EEG encoder and a BERT-based text encoder project multimodal inputs into a shared embedding space. The EEG embeddings are indexed in a FAISS vector database together with their associated report texts and labels, forming a searchable multimodal vector database. (B) Inference: Given a query EEG segment, the same EEG encoder (shared weights) produces a query embedding, which retrieves the Top-$K$ most similar historical EEG cases via nearest-neighbor search (cosine similarity). The retrieved report texts are then assembled with task instructions into a retrieval-augmented prompt to condition an LLM, generating an evidence-grounded and interpretable EEG report.
  • Figure 4: Architecture of the Cross-Modal Contrastive Learning Framework. The model employs a dual-tower structure to align electrophysiological signals with clinical narratives in a shared embedding space. (Left) EEG Encoder (Deep4Net): The encoder processes raw EEG inputs ($C=19, T=2500$) through a specialized hierarchy: 1) Temporal Convolution: Extracts frequency features using $(1, 10)$ kernels, expanding feature depth to 25. 2) Spatial Convolution: Aggregates spatial information across all 19 channels using $(19, 1)$ kernels, compressing the spatial dimension to 1. 3) Hierarchical Pooling: Subsequent blocks progressively increase feature depth ($25 \to 200$) while reducing temporal resolution via Max Pooling. The final output is projected to a 512-dimensional embedding vector $\mathbf{v}_e$. (Right) Text Encoder (BERT): Clinical reports are tokenized and processed by a pre-trained BERT-base model. The global semantic context is extracted via the [CLS] token and projected to a text embedding vector $\mathbf{v}_t \in \mathbb{R}^{512}$. (Center) Joint Optimization: The network is trained using a symmetric InfoNCE loss. The heatmap illustrates the objective: maximizing cosine similarity for matched positive pairs (diagonal, dark squares) while minimizing it for unmatched negative pairs (off-diagonal) within the batch.
  • Figure 5: Interpretability Case Study: Evidence-Grounded EEG Report Generation via Multimodal RAG.(Left) Input and query: An EEG segment containing a suspected IED is provided as the query. (Middle) Evidence retrieval: Clinically relevant EEG--report pairs are retrieved from a FAISS-based vector database according to the learned EEG--text embedding similarity. The retrieved neighbors exhibit spike--wave morphology consistent with the query, and their associated clinician reports provide explicit diagnostic context. (Right) Generation and traceability: The final EEG report is generated by a large language model conditioned on the retrieved evidence, enabling case-based traceability by linking the generated statements to the retrieved clinical precedents. Overall, transparent evidence tracing is enabled by explicitly exposing retrieved neighbors and their original reports, thereby supporting interpretable IED detection and EEG report generation.
  • ...and 2 more figures