VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG
Yankun Xu, Junzhe Wang, Yun-Hsuan Chen, Jie Yang, Wenjie Ming, Shuang Wang, Mohamad Sawan
TL;DR
This work tackles the challenge of real-time, video-based epileptic seizure onset detection without relying on EEG by introducing VSViG, a skeleton-based spatiotemporal Vision Graph network that uses joint-centered patch embeddings. The method fine-tunes a pose estimator for epileptic patients, constructs a partitioned skeleton graph, and applies spatial and temporal graph convolutions, followed by a probabilistic, accumulative decision rule to detect onset with low latency. It achieves state-of-the-art accuracy (RMSE ≈ 5.9% for the full model) and efficiency (FLOPs ~1.76G for VSViG; 0.44G for VSViG-Light), enables early detection (latency ≈ 5.1 s after EEG onset and ≈ 13.1 s before clinical onset) with zero false detections in tested cases, and offers interpretable visualizations of seizure-relevant partitions. The approach holds practical potential for continuous remote monitoring and could be extended to other movement-related disorders such as Parkinson’s disease or fall detection.
Abstract
An accurate and efficient epileptic seizure onset detection can significantly benefit patients. Traditional diagnostic methods, primarily relying on electroencephalograms (EEGs), often result in cumbersome and non-portable solutions, making continuous patient monitoring challenging. The video-based seizure detection system is expected to free patients from the constraints of scalp or implanted EEG devices and enable remote monitoring in residential settings. Previous video-based methods neither enable all-day monitoring nor provide short detection latency due to insufficient resources and ineffective patient action recognition techniques. Additionally, skeleton-based action recognition approaches remain limitations in identifying subtle seizure-related actions. To address these challenges, we propose a novel Video-based Seizure detection model via a skeleton-based spatiotemporal Vision Graph neural network (VSViG) for its efficient, accurate and timely purpose in real-time scenarios. Our experimental results indicate VSViG outperforms previous state-of-the-art action recognition models on our collected patients' video data with higher accuracy (5.9% error), lower FLOPs (0.4G), and smaller model size (1.4M). Furthermore, by integrating a decision-making rule that combines output probabilities and an accumulative function, we achieve a 5.1 s detection latency after EEG onset, a 13.1 s detection advance before clinical onset, and a zero false detection rate. The project homepage is available at: https://github.com/xuyankun/VSViG/
