Table of Contents
Fetching ...

Advancing Brainwave-Based Biometrics: A Large-Scale, Multi-Session Evaluation

Matin Fallahi, Patricia Arias-Cabarcos, Thorsten Strufe

TL;DR

This work addresses the generalizability and long-term reliability of EEG-based biometrics by leveraging a large-scale public dataset (PEERS) with 345 subjects and 6,007 sessions across five years and three headsets. It shows that deep metric-learning pipelines (notably ResNet1D with SupConLoss and Euclidean comparison) outperform handcrafted features, while performance degrades over time unless enrollment data are refreshed. The study also demonstrates that meaningful authentication is feasible with consumer-grade EEG channel counts and cross-device training, but falls short of international biometric standards, highlighting the need for much larger training sets and improved architectures. The authors provide open-source code to enable reproducible research and encourage community-driven progress toward scalable, durable brainwave-based authentication.

Abstract

The field of brainwave-based biometrics has gained attention for its potential to revolutionize user authentication through hands-free interaction, resistance to shoulder surfing, continuous authentication, and revocability. However, current research often relies on single-session or limited-session datasets with fewer than 55 subjects, raising concerns about the generalizability of the findings. To address this gap, we conducted a large-scale study using a public brainwave dataset comprising 345 subjects and over 6,007 sessions (an average of 17 per subject) recorded over five years using three headsets. Our results reveal that deep learning approaches significantly outperform hand-crafted feature extraction methods. We also observe Equal Error Rates (EER) increases over time (e.g., from 6.7% after 1 day to 14.3% after a year). Therefore, it is necessary to reinforce the enrollment set after successful login attempts. Moreover, we demonstrate that fewer brainwave measurement sensors can be used, with an acceptable increase in EER, which is necessary for transitioning from medical-grade to affordable consumer-grade devices. Finally, we compared our results to prior work and existing biometric standards. While our performance is on par with or exceeds previous approaches, it still falls short of industrial benchmarks. Based on the results, we hypothesize that further improvements are possible with larger training sets. To support future research, we have open-sourced our analysis code.

Advancing Brainwave-Based Biometrics: A Large-Scale, Multi-Session Evaluation

TL;DR

This work addresses the generalizability and long-term reliability of EEG-based biometrics by leveraging a large-scale public dataset (PEERS) with 345 subjects and 6,007 sessions across five years and three headsets. It shows that deep metric-learning pipelines (notably ResNet1D with SupConLoss and Euclidean comparison) outperform handcrafted features, while performance degrades over time unless enrollment data are refreshed. The study also demonstrates that meaningful authentication is feasible with consumer-grade EEG channel counts and cross-device training, but falls short of international biometric standards, highlighting the need for much larger training sets and improved architectures. The authors provide open-source code to enable reproducible research and encourage community-driven progress toward scalable, durable brainwave-based authentication.

Abstract

The field of brainwave-based biometrics has gained attention for its potential to revolutionize user authentication through hands-free interaction, resistance to shoulder surfing, continuous authentication, and revocability. However, current research often relies on single-session or limited-session datasets with fewer than 55 subjects, raising concerns about the generalizability of the findings. To address this gap, we conducted a large-scale study using a public brainwave dataset comprising 345 subjects and over 6,007 sessions (an average of 17 per subject) recorded over five years using three headsets. Our results reveal that deep learning approaches significantly outperform hand-crafted feature extraction methods. We also observe Equal Error Rates (EER) increases over time (e.g., from 6.7% after 1 day to 14.3% after a year). Therefore, it is necessary to reinforce the enrollment set after successful login attempts. Moreover, we demonstrate that fewer brainwave measurement sensors can be used, with an acceptable increase in EER, which is necessary for transitioning from medical-grade to affordable consumer-grade devices. Finally, we compared our results to prior work and existing biometric standards. While our performance is on par with or exceeds previous approaches, it still falls short of industrial benchmarks. Based on the results, we hypothesize that further improvements are possible with larger training sets. To support future research, we have open-sourced our analysis code.

Paper Structure

This paper contains 35 sections, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Number of sessions in the evaluation dataset post-enrollment. $W$, $M$, and $Y$ denote week, month, and year. Labels (e.g., $W3$) mark the number of sessions in that time unit, starting the day after the previous unit ends (e.g., $W3$: days 15–21).
  • Figure 2: EER vs. Number of Subjects: Each blue dot represents the average of 50 instances of randomly selecting $N$ subjects and calculating the EER. The dot size represents the standard deviation of the EER values across these 50 calculations.
  • Figure 3: Relationship between the time interval between sessions and EER. $D$ represents a day, $W$ a week, $M$ a month, and $Y$ a year. The value after each symbol indicates the start time interval (except for days). For example, $W1$ represents sessions with a time interval of 8–14 days after the first session.
  • Figure 4: t-SNE visualization of embeddings. Color represents the average cosine distance to the top 5 nearest samples from the same subject but different session. Warm colors indicate greater distances. Considering all samples, the average intra-class distance is 0.75 and the average inter-class distance, 0.97.
  • Figure 5: t-SNE visualization of embeddings. Colors indicate different EEG headsets used for data collection.
  • ...and 4 more figures