A Toolkit for Virtual Reality Data Collection
Tim Rolff, Niklas Hypki, Markus Lappe, Frank Steinicke
TL;DR
This work tackles the scarcity of large-scale, multimodal VR datasets by introducing the OpenXR Data Recorder (OXDR) toolkit built on Unity3D. It delivers a frame-rate-independent data capture workflow, an extensible data format (NDJSON or MessagePack), a recording component, and a Python analysis toolkit to enable efficient ML-ready VR data collection. Key contributions include a structured data model (Metadata, Snapshot, Device, Feature types), ethical data-handling guidelines aligned with GDPR, and standardized surveys to link qualitative and quantitative signals. The framework aims to democratize large VR datasets, enabling robust ML, psychological modeling, and data-analysis methods across VR research domains.
Abstract
Due to the still relatively low number of users, acquiring large-scale and multidimensional virtual reality datasets remains a significant challenge. Consequently, VR datasets comparable in size to state-of-the-art collections in natural language processing or computer vision are rare or absent. However, the availability of such datasets could unlock groundbreaking advancements in deep-learning, psychological modeling, and data analysis in the context of VR. In this paper, we present a versatile data collection toolkit designed to facilitate the capturing of extensive VR datasets. Our toolkit seamlessly integrates with any device, either directly via OpenXR or through the use of a virtual device. Additionally, we introduce a robust data collection pipeline that emphasizes ethical practices (e.g., ensuring data protection and regulation) and ensures a standardized, reproducible methodology.
