Table of Contents
Fetching ...

XR Blocks: Accelerating Human-centered AI + XR Innovation

David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, Jeremy Nelson, Xiuxiu Yuan, Jolica Dias, Tim Bettridge, Benjamin Hersh, Michelle Huynh, Konrad Piascik, Ricardo Cabello, David Kim, Ruofei Du

TL;DR

XR Blocks addresses the fragmentation between AI ecosystems and XR prototyping by introducing a cross-platform framework built on WebXR, three.js, TensorFlow, and Gemini. It centers on a Reality Model and a Core Engine to separate the what of an interaction from the how of its low-level implementation, enabling rapid, human-centered AI + XR prototyping. The paper details design principles, architectural components, and demonstrative applications (XR Realism, Intelligent Environments) to illustrate scalable, interactive, real-time experiences that fuse perception, interaction, and AI. It also discusses limitations and future directions, including learnable interaction grammars, differentiable realities, and privacy considerations, outlining a path toward an ecosystem where ideas can be translated into interactive XR realities with minimal friction.

Abstract

We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like JAX and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks strives to provide a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "reducing frictions from idea to reality", thus accelerating rapid prototyping of AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive XR prototype. Site: https://xrblocks.github.io

XR Blocks: Accelerating Human-centered AI + XR Innovation

TL;DR

XR Blocks addresses the fragmentation between AI ecosystems and XR prototyping by introducing a cross-platform framework built on WebXR, three.js, TensorFlow, and Gemini. It centers on a Reality Model and a Core Engine to separate the what of an interaction from the how of its low-level implementation, enabling rapid, human-centered AI + XR prototyping. The paper details design principles, architectural components, and demonstrative applications (XR Realism, Intelligent Environments) to illustrate scalable, interactive, real-time experiences that fuse perception, interaction, and AI. It also discusses limitations and future directions, including learnable interaction grammars, differentiable realities, and privacy considerations, outlining a path toward an ecosystem where ideas can be translated into interactive XR realities with minimal friction.

Abstract

We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like JAX and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks strives to provide a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "reducing frictions from idea to reality", thus accelerating rapid prototyping of AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive XR prototype. Site: https://xrblocks.github.io

Paper Structure

This paper contains 18 sections, 5 figures.

Figures (5)

  • Figure 1: XR Blocks roadmap. This initial version of xrblocks.js serves as an opensource framework to accelerate prototyping with WebAI and WebXR. It should be iterated to empower vibe coding for XR. We envision this framework to achieve "idea to reality" in XR that follows human-centered objectives Chen2023Next: (i) aligning with human values in XR; (ii) assimilating human intents in XR; and (iii) augmenting human abilities in XR.
  • Figure 2: The conceptual Reality Model of the XR Blocks framework. At the center, the Script contains the application's logic and operates on a unified model of first-class primitives including the user, the physical world, AI agents, and the application context. Entities with $^*$ have not yet been fully implemented in the GitHub.
  • Figure 3: The modular architecture of the XR Blocks's Script engine. The core engine consists of essential subsystems to realize the framework's high-level abstractions, spanning perception (depth, input), AI integration (ai, agent), and user experience (ui, ux). Subsystems with $^*$ have not yet been fully implemented in the GitHub. While we show a few examples in XR Blocks, many implementations are far from perfection.
  • Figure 4: The Interaction Grammar of XR Blocks, which abstracts user input by distinguishing between two types of interaction. Explicit events are direct, low-level inputs (e.g., a touch or click), while implicit intents are higher-level interpretations (e.g., a gesture or voice command), allowing creators to build interaction against user intent.
  • Figure 5: Applications of XR Blocks. (1) XR Realism: depth-aware and physics-based ballpit and splash games; geometry-aware shadows, 3D Gaussian splatting with occlusion, and lighting estimation. (2) XR Interaction: immersive emoji and rock paper scissors game empowered by custom ML models, dynamic swipe recognition, touch and grab with the physical world. (3) AI + XR: integration with Gemini Live, XR objects, glasses simulation in XR, and poem generation with real-world camera.