Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments

Xueyang Wang; Qinxuan Cen; Weitao Bi; Yunxiang Ma; Xin Yi; Robert Xiao; Xinyi Fu; Hewu Li

Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments

Xueyang Wang, Qinxuan Cen, Weitao Bi, Yunxiang Ma, Xin Yi, Robert Xiao, Xinyi Fu, Hewu Li

TL;DR

Roomify is presented, a spatially-grounded transformation system that generates themed virtual environments anchored to users' physical rooms while maintaining spatial structure and functional semantics and introduces a cross-reality authoring tool enabling fine-grained user control through MR editing and VR preview workflows.

Abstract

We present Roomify, a spatially-grounded transformation system that generates themed virtual environments anchored to users' physical rooms while maintaining spatial structure and functional semantics. Current VR approaches face a fundamental trade-off: full immersion sacrifices spatial awareness, while passthrough solutions break presence. Roomify addresses this through spatially-grounded transformation - treating physical spaces as "spatial containers" that preserve key functional and geometric properties of furniture while enabling radical stylistic changes. Our pipeline combines in-situ 3D scene understanding, AI-driven spatial reasoning, and style-aware generation to create personalized virtual environments grounded in physical reality. We introduce a cross-reality authoring tool enabling fine-grained user control through MR editing and VR preview workflows. Two user studies validate our approach: one with 18 VR users demonstrates a 63% improvement in presence over passthrough and 26% over fully virtual baselines while maintaining spatial awareness; another with 8 design professionals confirms the system's creative expressiveness (scene quality: 5.95/7; creativity support: 6.08/7) and professional workflow value across diverse environments.

Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments

TL;DR

Abstract

Paper Structure (53 sections, 15 figures, 3 tables, 2 algorithms)

This paper contains 53 sections, 15 figures, 3 tables, 2 algorithms.

Introduction
Related Work
Integrating Physical and Virtual Environments
3D Scene Understanding and Generative Stylization
AI-Assisted Spatial Authoring
Formative Study
Methodology
Findings
Design Requirements
Spatially-Grounded Scene Generation Pipeline
Pipeline Overview
Spatial Scene Understanding
Style Extraction and Mapping
Multi-Modal Content Generation
Scene Composition
...and 38 more sections

Figures (15)

Figure 1: The Roomify user journey from physical room to themed virtual environment. Users begin in their real space (a), scan the room geometry (b), review spatial understanding results (c), specify their desired style through multimodal input (d), observe and adjust real-time generation in MR (e), and finally immerse themselves in the transformed environment in VR (f). The system preserves spatial layout and functional semantics of furniture while enabling radical stylistic transformation, allowing users to inhabit fantastical worlds grounded in their familiar physical space.
Figure 2: Spatially-grounded scene generation pipeline showing the four-stage transformation process from physical room capture to stylized virtual environment.
Figure 3: Style extraction and mapping workflow showing transformation from user intent and spatial understanding results to structured generation specifications. The process extracts style keywords from multimodal input and creates object-level mapping tables satisfying four criteria: function consistency, style coherence, environmental consistency, and interaction safety.
Figure 4: Reference-guided object generation workflow: best-view frame selection captures optimal geometry, style-aware image generation preserves spatial characteristics, and 3D model conversion completes the transformation. Examples show diverse styles while maintaining semantic recognition and spatial consistency.
Figure 5: Cross-Reality Authoring Tool interface. (A) Spatial Scaffold Visualization displays detected objects with labeled, color-coded wireframe boundaries overlaid on the physical environment. (B) Multimodal Style Specification enables combining text descriptions and reference images. (C) Spatial Scaffold Manipulation provides controls for selection, rotation, translation, and scaling; selected objects become translucent to reveal underlying physical objects for accurate alignment. (D) Generation Process Supervision shows status panels with generation progress, object information, and voice-based refinement commands; wireframe colors indicate status (blue: generating, green: complete, red: requires attention).
...and 10 more figures

Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments

TL;DR

Abstract

Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (15)