Table of Contents
Fetching ...

SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending

Nels Numan, Shwetha Rajaram, Balasaravanan Thoravi Kumaravel, Nicolai Marquardt, Andrew D. Wilson

TL;DR

SpaceBlender addresses the challenge of grounding VR telepresence environments in users' real contexts by blending multiple input spaces into a cohesive 3D space. The method combines Stage 1 2D-to-3D submeshes, floor alignment, circle-based submesh layout, and a geometric prior; Stage 2 performs diffusion-based space completion guided by geometric priors and contextual prompts from VLMs and LLMs, with an expanded inpainting context (512×1280) and a trained ControlNet-Layout. A preliminary within-subjects study with 20 participants shows SpaceBlender improves self-location familiarity and navigability compared to Text2Room, though both generative environments exhibit texture/geometry artifacts and require higher realism for broader adoption. The work provides a pipeline, a preliminary evaluation, and directions for improving realism, alignment with real-world spaces, and enabling explicit collaborative interactions in blended VR spaces.

Abstract

There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.

SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending

TL;DR

SpaceBlender addresses the challenge of grounding VR telepresence environments in users' real contexts by blending multiple input spaces into a cohesive 3D space. The method combines Stage 1 2D-to-3D submeshes, floor alignment, circle-based submesh layout, and a geometric prior; Stage 2 performs diffusion-based space completion guided by geometric priors and contextual prompts from VLMs and LLMs, with an expanded inpainting context (512×1280) and a trained ControlNet-Layout. A preliminary within-subjects study with 20 participants shows SpaceBlender improves self-location familiarity and navigability compared to Text2Room, though both generative environments exhibit texture/geometry artifacts and require higher realism for broader adoption. The work provides a pipeline, a preliminary evaluation, and directions for improving realism, alignment with real-world spaces, and enabling explicit collaborative interactions in blended VR spaces.

Abstract

There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.
Paper Structure (65 sections, 16 figures, 1 table)

This paper contains 65 sections, 16 figures, 1 table.

Figures (16)

  • Figure 1: A birds-eye view of two meshes that failed to blend due to the lack of geometric guidance and context throughout the iterative mesh completion process.
  • Figure 2: Overview of Stage 1 components as described in Sec. \ref{['sec:system:stage1']}.
  • Figure 3: Comparison between unaligned submeshes and submeshes aligned with our semantic floor alignment technique. The unaligned spaces have floors at different levels and inclines that can be jarring to navigate.
  • Figure 4: Overview of Stage 2 components as described in Sec. \ref{['sec:system:stage2']}.
  • Figure 5: Comparison of output generated with varying weights of ControlNet depth and layout models, impacting the prior's impact on the output (generated with fixed seed). Top: input images including the input image, depth prior image, and layout prior image rendered from geometric prior. Bottom: results with varying weights are indicated in parentheses.
  • ...and 11 more figures