Table of Contents
Fetching ...

BehaVR: User Identification Based on VR Sensor Data

Ismat Jarin, Yu Duan, Rahmadi Trimananda, Hao Cui, Salma Elmalaki, Athina Markopoulou

TL;DR

BehaVR is the first to analyze user identification in VR comprehensively, i.e., considering all sensor measurements available on consumer VR devices, collected by multiple real-world, as opposed to custom-made, apps.

Abstract

Virtual reality (VR) platforms enable a wide range of applications, however, pose unique privacy risks. In particular, VR devices are equipped with a rich set of sensors that collect personal and sensitive information (e.g., body motion, eye gaze, hand joints, and facial expression). The data from these newly available sensors can be used to uniquely identify a user, even in the absence of explicit identifiers. In this paper, we seek to understand the extent to which a user can be identified based solely on VR sensor data, within and across real-world apps from diverse genres. We consider adversaries with capabilities that range from observing APIs available within a single app (app adversary) to observing all or selected sensor measurements across multiple apps on the VR device (device adversary). To that end, we introduce BehaVR, a framework for collecting and analyzing data from all sensor groups collected by multiple apps running on a VR device. We use BehaVR to collect data from real users that interact with 20 popular real-world apps. We use that data to build machine learning models for user identification within and across apps, with features extracted from available sensor data. We show that these models can identify users with an accuracy of up to 100%, and we reveal the most important features and sensor groups, depending on the functionality of the app and the adversary. To the best of our knowledge, BehaVR is the first to analyze user identification in VR comprehensively, i.e., considering all sensor measurements available on consumer VR devices, collected by multiple real-world, as opposed to custom-made, apps.

BehaVR: User Identification Based on VR Sensor Data

TL;DR

BehaVR is the first to analyze user identification in VR comprehensively, i.e., considering all sensor measurements available on consumer VR devices, collected by multiple real-world, as opposed to custom-made, apps.

Abstract

Virtual reality (VR) platforms enable a wide range of applications, however, pose unique privacy risks. In particular, VR devices are equipped with a rich set of sensors that collect personal and sensitive information (e.g., body motion, eye gaze, hand joints, and facial expression). The data from these newly available sensors can be used to uniquely identify a user, even in the absence of explicit identifiers. In this paper, we seek to understand the extent to which a user can be identified based solely on VR sensor data, within and across real-world apps from diverse genres. We consider adversaries with capabilities that range from observing APIs available within a single app (app adversary) to observing all or selected sensor measurements across multiple apps on the VR device (device adversary). To that end, we introduce BehaVR, a framework for collecting and analyzing data from all sensor groups collected by multiple apps running on a VR device. We use BehaVR to collect data from real users that interact with 20 popular real-world apps. We use that data to build machine learning models for user identification within and across apps, with features extracted from available sensor data. We show that these models can identify users with an accuracy of up to 100%, and we reveal the most important features and sensor groups, depending on the functionality of the app and the adversary. To the best of our knowledge, BehaVR is the first to analyze user identification in VR comprehensively, i.e., considering all sensor measurements available on consumer VR devices, collected by multiple real-world, as opposed to custom-made, apps.
Paper Structure (51 sections, 10 figures, 13 tables)

This paper contains 51 sections, 10 figures, 13 tables.

Figures (10)

  • Figure 1: BehaVR problem space spans several dimensions: users, apps, and sensors. We consider four sensor groups: body motion (BM), eye gaze (EG), hand joints (HJ), facial expression (FE). We consider 20 real-world apps covering vast domains of VR apps. We have two types of adversaries: the app adversary has access only to one app; the device adversary has access across multiple apps. We further define App Groups as having similar activities and emotional states.
  • Figure 2: Overview of BehaVR. (1) Data Collection Setup: every user interacts with each app using Quest Pro; each app (e.g., Beat Saber) runs on a PC and its VR environment is rendered on the Quest Pro headset; this enables the recording of sensor data sent from Quest Pro to the PC; apps are grouped based on similarity of activities and emotional states. (2) Data Processing: there are four groups of sensors, namely body motion, eye gaze, hand joints, and facial expression; we divide the time series generated by every sensor group into blocks, and we compute 5 statistics per block as features. (3) Model Training & Evaluation: using the previous features per block, we train different models (using data per app, across apps, even per group of apps) that an adversary can use to uniquely identify users.
  • Figure 3: Durations of sessions. There are 20 users, each interacts with 20 apps. Colors represent app groups.
  • Figure 4: FBA illustration for the $x$ value of headset rotation from the BM sensor group.
  • Figure 5: Identification accuracy (in percent) in the zero-day scenario. The adversary trains on the data from other apps in a group, and tests in a new app (for which it did not have training data) in the same group. The diagonal shows the accuracy for apps within the same group, whereas the other values show the accuracy for apps from other app groups.
  • ...and 5 more figures