Table of Contents
Fetching ...

psiUnity: A Platform for Multimodal Data-Driven XR

Akhil Ajikumar, Sahil Mayenkar, Steven Yoo, Sakib Reza, Mohsen Moghaddam

TL;DR

The paper tackles the challenge of achieving precise temporal alignment across heterogeneous XR sensors within Unity-driven workflows. It introduces psiUnity, a native C# integration that embeds psi's deterministic multimodal processing into Unity 2022.3/MRTK3 for HoloLens 2, enabling real-time, time-synchronized streaming of data from sources such as AHAT/Long-Throw depth, IMU, head pose, gaze, and hand tracking. The system preserves high-precision timestamps and provides native psi serialization, logging, and replay within Unity, enhancing reproducibility for data-driven XR experiments. By bridging psi with the Unity ecosystem and extending beyond StereoKit, psiUnity broadens access to deterministic multimodal data processing for HRI, HCI, and embodied-AI research in immersive environments. The work is open-source and NSF-supported, facilitating reproducible, scalable experiments and richer analyses in Unity-based XR research.

Abstract

Extended reality (XR) research increasingly relies on the ability to stream and synchronize multimodal data between headsets and immersive applications for data-driven interaction and experimentation. However, developers face a critical gap: the Platform for Situated Intelligence (psi), which excels at deterministic temporal alignment and multimodal data management, has been largely inaccessible to the dominant Unity/MRTK ecosystem used for HoloLens development. We introduce psiUnity, an open-source C# integration that bridges psi's .NET libraries with Unity 2022.3 and MRTK3 for HoloLens 2. psiUnity enables bidirectional, real-time streaming of head pose, hand tracking, gaze, IMU, audio, and depth sensor data (AHAT and long-throw) with microsecond-level temporal precision, allowing Unity applications to both consume and produce synchronized multimodal data streams. By embedding psi's native serialization, logging, and temporal coordination directly within Unity's architecture, psiUnity extends psi beyond its previous StereoKit limitations and empowers the HRI, HCI, and embodied-AI communities to develop reproducible, data-driven XR interactions and experiments within the familiar Unity environment. The integration is available at https://github.com/sailgt/psiUnity.

psiUnity: A Platform for Multimodal Data-Driven XR

TL;DR

The paper tackles the challenge of achieving precise temporal alignment across heterogeneous XR sensors within Unity-driven workflows. It introduces psiUnity, a native C# integration that embeds psi's deterministic multimodal processing into Unity 2022.3/MRTK3 for HoloLens 2, enabling real-time, time-synchronized streaming of data from sources such as AHAT/Long-Throw depth, IMU, head pose, gaze, and hand tracking. The system preserves high-precision timestamps and provides native psi serialization, logging, and replay within Unity, enhancing reproducibility for data-driven XR experiments. By bridging psi with the Unity ecosystem and extending beyond StereoKit, psiUnity broadens access to deterministic multimodal data processing for HRI, HCI, and embodied-AI research in immersive environments. The work is open-source and NSF-supported, facilitating reproducible, scalable experiments and richer analyses in Unity-based XR research.

Abstract

Extended reality (XR) research increasingly relies on the ability to stream and synchronize multimodal data between headsets and immersive applications for data-driven interaction and experimentation. However, developers face a critical gap: the Platform for Situated Intelligence (psi), which excels at deterministic temporal alignment and multimodal data management, has been largely inaccessible to the dominant Unity/MRTK ecosystem used for HoloLens development. We introduce psiUnity, an open-source C# integration that bridges psi's .NET libraries with Unity 2022.3 and MRTK3 for HoloLens 2. psiUnity enables bidirectional, real-time streaming of head pose, hand tracking, gaze, IMU, audio, and depth sensor data (AHAT and long-throw) with microsecond-level temporal precision, allowing Unity applications to both consume and produce synchronized multimodal data streams. By embedding psi's native serialization, logging, and temporal coordination directly within Unity's architecture, psiUnity extends psi beyond its previous StereoKit limitations and empowers the HRI, HCI, and embodied-AI communities to develop reproducible, data-driven XR interactions and experiments within the familiar Unity environment. The integration is available at https://github.com/sailgt/psiUnity.

Paper Structure

This paper contains 7 sections, 1 figure.

Figures (1)

  • Figure 1: System pipeline and schematic overview of the psiUnity integration framework. The system bridges multimodal sensor data from the HoloLens 2 (Research Mode) into the Unity 2022.3 environment via native C# components and \\ psi’s .NET libraries. The base infrastructure for initializing and managing \\ psi pipelines is provided by the PsiCaptureController, while specific data streams—such as IMU, head pose, eye gaze, hand tracking, RGB video, and depth (AHAT and Long-throw)—are defined and configured within its subclass DefaultPsiCaptureController. The HoloLens 2 captures these diverse sensor streams, which are timestamped and processed directly within Unity using \\ psi’s native serialization format. This enables synchronized multimodal logging and replay for XR research, with \\ psi’s deterministic temporal alignment natively embedded into Unity applications. The inset shows a first-person view during the assembly task, where the AR interface allows users to start or stop specific data streams (e.g., RGB, depth, IMU, audio) before initiating multimodal data capture.