Table of Contents
Fetching ...

A Toolkit for Virtual Reality Data Collection

Tim Rolff, Niklas Hypki, Markus Lappe, Frank Steinicke

TL;DR

This work tackles the scarcity of large-scale, multimodal VR datasets by introducing the OpenXR Data Recorder (OXDR) toolkit built on Unity3D. It delivers a frame-rate-independent data capture workflow, an extensible data format (NDJSON or MessagePack), a recording component, and a Python analysis toolkit to enable efficient ML-ready VR data collection. Key contributions include a structured data model (Metadata, Snapshot, Device, Feature types), ethical data-handling guidelines aligned with GDPR, and standardized surveys to link qualitative and quantitative signals. The framework aims to democratize large VR datasets, enabling robust ML, psychological modeling, and data-analysis methods across VR research domains.

Abstract

Due to the still relatively low number of users, acquiring large-scale and multidimensional virtual reality datasets remains a significant challenge. Consequently, VR datasets comparable in size to state-of-the-art collections in natural language processing or computer vision are rare or absent. However, the availability of such datasets could unlock groundbreaking advancements in deep-learning, psychological modeling, and data analysis in the context of VR. In this paper, we present a versatile data collection toolkit designed to facilitate the capturing of extensive VR datasets. Our toolkit seamlessly integrates with any device, either directly via OpenXR or through the use of a virtual device. Additionally, we introduce a robust data collection pipeline that emphasizes ethical practices (e.g., ensuring data protection and regulation) and ensures a standardized, reproducible methodology.

A Toolkit for Virtual Reality Data Collection

TL;DR

This work tackles the scarcity of large-scale, multimodal VR datasets by introducing the OpenXR Data Recorder (OXDR) toolkit built on Unity3D. It delivers a frame-rate-independent data capture workflow, an extensible data format (NDJSON or MessagePack), a recording component, and a Python analysis toolkit to enable efficient ML-ready VR data collection. Key contributions include a structured data model (Metadata, Snapshot, Device, Feature types), ethical data-handling guidelines aligned with GDPR, and standardized surveys to link qualitative and quantitative signals. The framework aims to democratize large VR datasets, enabling robust ML, psychological modeling, and data-analysis methods across VR research domains.

Abstract

Due to the still relatively low number of users, acquiring large-scale and multidimensional virtual reality datasets remains a significant challenge. Consequently, VR datasets comparable in size to state-of-the-art collections in natural language processing or computer vision are rare or absent. However, the availability of such datasets could unlock groundbreaking advancements in deep-learning, psychological modeling, and data analysis in the context of VR. In this paper, we present a versatile data collection toolkit designed to facilitate the capturing of extensive VR datasets. Our toolkit seamlessly integrates with any device, either directly via OpenXR or through the use of a virtual device. Additionally, we introduce a robust data collection pipeline that emphasizes ethical practices (e.g., ensuring data protection and regulation) and ensures a standardized, reproducible methodology.

Paper Structure

This paper contains 15 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Data format used to store the captured data of our recording toolkit. See \ref{['sec: Data Format']} for more details.
  • Figure 2: Example showing the data format. The first entry is always a metadata structure (c.f. \ref{['sec: Metadata']}) followed by multiple snapshots (c.f. \ref{['sec: Snapshot']} containing an array of devices. Note that it is not necessary for a snapshot to contain the same devices every update cycle. Furthermore, each device contains a set of features (c.f. \ref{['sec: controls']}) that are also not restricted to the same layout every update cycle.
  • Figure 3: Predefined study procedure when capturing data through our toolkit.
  • Figure 4: Overall procedure for our toolkit.