Table of Contents
Fetching ...

Effect of Duration and Delay on the Identifiability of VR Motion

Mark Roman Miller, Vivek Nair, Eugy Han, Cyan DeVeaux, Christian Rack, Rui Wang, Brandon Huang, Marc Erich Latoschik, James F. O'Brien, Jeremy N. Bailenson

TL;DR

It is found that training data duration and train-test delay affect identifiability; that minimal train-test delay leads to very high accuracy; and that train-test delay should be controlled in future experiments.

Abstract

Social virtual reality is an emerging medium of communication. In this medium, a user's avatar (virtual representation) is controlled by the tracked motion of the user's headset and hand controllers. This tracked motion is a rich data stream that can leak characteristics of the user or can be effectively matched to previously-identified data to identify a user. To better understand the boundaries of motion data identifiability, we investigate how varying training data duration and train-test delay affects the accuracy at which a machine learning model can correctly classify user motion in a supervised learning task simulating re-identification. The dataset we use has a unique combination of a large number of participants, long duration per session, large number of sessions, and a long time span over which sessions were conducted. We find that training data duration and train-test delay affect identifiability; that minimal train-test delay leads to very high accuracy; and that train-test delay should be controlled in future experiments.

Effect of Duration and Delay on the Identifiability of VR Motion

TL;DR

It is found that training data duration and train-test delay affect identifiability; that minimal train-test delay leads to very high accuracy; and that train-test delay should be controlled in future experiments.

Abstract

Social virtual reality is an emerging medium of communication. In this medium, a user's avatar (virtual representation) is controlled by the tracked motion of the user's headset and hand controllers. This tracked motion is a rich data stream that can leak characteristics of the user or can be effectively matched to previously-identified data to identify a user. To better understand the boundaries of motion data identifiability, we investigate how varying training data duration and train-test delay affects the accuracy at which a machine learning model can correctly classify user motion in a supervised learning task simulating re-identification. The dataset we use has a unique combination of a large number of participants, long duration per session, large number of sessions, and a long time span over which sessions were conducted. We find that training data duration and train-test delay affect identifiability; that minimal train-test delay leads to very high accuracy; and that train-test delay should be controlled in future experiments.
Paper Structure (20 sections, 2 figures, 1 table)

This paper contains 20 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Separating the training and testing sets by larger time reduces accuracy. The x-axis and y-axis are the testing and training weeks, respectively. The panels are colored indicating identifiability (operationalized as multiclass AUC), with yellow as a higher accuracy. Note a trend that higher multiclass AUC is along the diagonal (i.e., minimal delay).
  • Figure 2: Number of sessions and duration of each session affect identifiability, operationalized as multiclass AUC. Two panels shown horizontally indicate whether the comparison is drawn between sessions or within the same session. Within each panel, the x-axis indicates the training duration per session, and y-axis indicates the number of sessions. The rectangles are colored indicating identifiability, with yellow as a higher accuracy.