Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

Mingjun Li; Natasha Kholgade Banerjee; Sean Banerjee

Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

Mingjun Li, Natasha Kholgade Banerjee, Sean Banerjee

TL;DR

This work presents the first approach to use motion forecasting to enhance behavior-based VR authentication. By forecasting future user motion from an initial trajectory with a Transformer-based Informer model and fusing it with observed data for authentication, the method achieves notable EER improvements over non-forecasted baselines on Miller et al.'s ball-throwing dataset. The study demonstrates that forecasting enables accurate authentication with shorter observation windows and discusses practical implications for early, low-latency security, while also addressing potential mimicry risks and proposing future tests on more diverse VR tasks and datasets. Overall, forecasting-based trajectory prediction offers a promising avenue to strengthen continuous VR authentication under partial data conditions.

Abstract

Task-based behavioral biometric authentication of users interacting in virtual reality (VR) environments enables seamless continuous authentication by using only the motion trajectories of the person's body as a unique signature. Deep learning-based approaches for behavioral biometrics show high accuracy when using complete or near complete portions of the user trajectory, but show lower performance when using smaller segments from the start of the task. Thus, any systems designed with existing techniques are vulnerable while waiting for future segments of motion trajectories to become available. In this work, we present the first approach that predicts future user behavior using Transformer-based forecasting and using the forecasted trajectory to perform user authentication. Our work leverages the notion that given the current trajectory of a user in a task-based environment we can predict the future trajectory of the user as they are unlikely to dramatically shift their behavior since it would preclude the user from successfully completing their task goal. Using the publicly available 41-subject ball throwing dataset of Miller et al. we show improvement in user authentication when using forecasted data. When compared to no forecasting, our approach reduces the authentication equal error rate (EER) by an average of 23.85% and a maximum reduction of 36.14%.

Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

TL;DR

Abstract

Paper Structure (21 sections, 7 equations, 6 figures, 4 tables)

This paper contains 21 sections, 7 equations, 6 figures, 4 tables.

Introduction
Related Work
Passwords and PINs
Behavioral Biometrics
Dataset
Data Preparation
Impostor Data Generation
Motion Forecasting
Feature Representation
Encoder
Decoder
Authentication
Fully Convolutional Network (FCN)
Transformer Encoder
Loss Functions
...and 6 more sections

Figures (6)

Figure 1: In our approach, we utilize the ground truth input trajectory to forecast the future trajectory, which is subsequently merged with the input trajectory to authenticate users. When compared to no forecasting, our approach reduces the authentication equal error rate (EER) by an average of 23.85% and a maximum reduction of 36.14%. The upper portion of the figure outlines our approach, while the lower portion shows the complete ground truth trajectory.
Figure 2: Left: To create the training set for authentication, we evenly sample sliding windows of size $n$ from day 1 trajectories of the genuine user. To create the impostor set, for each genuine sliding window, we randomly sample a subject and day 1 trajectory from the remaining users, and select a window from the trajectory sample at the same temporal location as the genuine sliding window. Right: we repeat the process with day 2 trajectories to create the test set, ensuring that the random ordering of subjects/sessions is different.
Figure 3: Pipeline flowchart of our proposed approach. In the first step, the input data is processed using the sliding window technique to generate sub-sequences. These sub-sequences are then fed into the forecasting model, which generates the forecasted sequence. The forecasted sequence is then concatenated with the original input data to form a combined sequence. Finally, the combined sequence is fed into the classifier for authentication. $135$, $10$, and $4$ represent the total timestamps in raw data, number of sessions, and number of features for each session, respectively.
Figure 4: (a) an FCN and (b) a Transformer Encoder as for authentication. (c) We use a modified Transformer for forecasting.
Figure 5: The input to the Encoder consists of the initial sequence (in gray) and the overlap sequence (in green), and the Decoder input consists of the overlap sequence (in green) and the sequence to be forecasted initialized with zeros (in red).
...and 1 more figures

Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

TL;DR

Abstract

Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

Authors

TL;DR

Abstract

Table of Contents

Figures (6)