Table of Contents
Fetching ...

TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models

Matheus Simão, Fabiano Prado, Omar Abdul Wahab, Anderson Avila

TL;DR

TemCharBERT is proposed, an architecture that incorporates temporal-character information in the embedding layer of CharBERT that allows modeling keystroke dynamics for the purpose of user identification and authentication and shows a significant improvement with this customization.

Abstract

With the widespread of digital environments, reliable authentication and continuous access control has become crucial. It can minimize cyber attacks and prevent frauds, specially those associated with identity theft. A particular interest lies on keystroke dynamics (KD), which refers to the task of recognizing individuals' identity based on their unique typing style. In this work, we propose the use of pre-trained language models (PLMs) to recognize such patterns. Although PLMs have shown high performance on multiple NLP benchmarks, the use of these models on specific tasks requires customization. BERT and RoBERTa, for instance, rely on subword tokenization, and they cannot be directly applied to KD, which requires temporal-character information to recognize users. Recent character-aware PLMs are able to process both subwords and character-level information and can be an alternative solution. Notwithstanding, they are still not suitable to be directly fine-tuned for KD as they are not optimized to account for user's temporal typing information (e.g., hold time and flight time). To overcome this limitation, we propose TempCharBERT, an architecture that incorporates temporal-character information in the embedding layer of CharBERT. This allows modeling keystroke dynamics for the purpose of user identification and authentication. Our results show a significant improvement with this customization. We also showed the feasibility of training TempCharBERT on a federated learning settings in order to foster data privacy.

TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models

TL;DR

TemCharBERT is proposed, an architecture that incorporates temporal-character information in the embedding layer of CharBERT that allows modeling keystroke dynamics for the purpose of user identification and authentication and shows a significant improvement with this customization.

Abstract

With the widespread of digital environments, reliable authentication and continuous access control has become crucial. It can minimize cyber attacks and prevent frauds, specially those associated with identity theft. A particular interest lies on keystroke dynamics (KD), which refers to the task of recognizing individuals' identity based on their unique typing style. In this work, we propose the use of pre-trained language models (PLMs) to recognize such patterns. Although PLMs have shown high performance on multiple NLP benchmarks, the use of these models on specific tasks requires customization. BERT and RoBERTa, for instance, rely on subword tokenization, and they cannot be directly applied to KD, which requires temporal-character information to recognize users. Recent character-aware PLMs are able to process both subwords and character-level information and can be an alternative solution. Notwithstanding, they are still not suitable to be directly fine-tuned for KD as they are not optimized to account for user's temporal typing information (e.g., hold time and flight time). To overcome this limitation, we propose TempCharBERT, an architecture that incorporates temporal-character information in the embedding layer of CharBERT. This allows modeling keystroke dynamics for the purpose of user identification and authentication. Our results show a significant improvement with this customization. We also showed the feasibility of training TempCharBERT on a federated learning settings in order to foster data privacy.

Paper Structure

This paper contains 19 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Keystroke metrics based on pressing and releasing timestamps, including latency interval, dwell (or hold) time and flight time.
  • Figure 2: Comparison of the contextual word representation in CharBERT and the TempCharBERT architecture with the proposed Temporal-character Encoder for keystroke dynamics.
  • Figure 3: Temporal-character Encoder that captures full word information from characters that comprise temporal information.
  • Figure 4: T-SNE visualization of CharBERT embeddings and TempCharBERT embeddings . Temporal keystroke information helps to capture user biometrics.