Table of Contents
Fetching ...

LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting

Haotian Mao, Zhuoxiong Xu, Siyue Wei, Yule Quan, Nianchen Deng, Xubo Yang

TL;DR

This work tackles real-time, physically plausible interaction in radiance-field VR by integrating language-model-based scene understanding with 3D Gaussian Splatting. Specifically, it treats scenes as a collection of Gaussian kernels $G_k$ with center $\mathbf{p}_k$ and covariance $\Sigma_k$, and uses GPT-4o to extract object- and particle-level properties that drive a unified PBD-based interpolation for rigid, soft, and granular dynamics. A GPT-assisted GS inpainting and a feature-mask segmentation strategy fill unseen regions and precisely segment kernels for interaction targets. Real-time VR demos demonstrate complex behaviors (e.g., a doll-like wolf, a breaking mug, bouncing balls) without manual annotation, illustrating seamless coupling between scene understanding, rendering, and physics. This approach promises scalable, language-guided VR asset creation with realistic interactivity.

Abstract

Recently, radiance field rendering, such as 3D Gaussian Splatting (3DGS), has shown immense potential in VR content creation due to its high-quality rendering and efficient production process. However, existing physics-based interaction systems for 3DGS can only perform simple and non-realistic simulations or demand extensive user input for complex scenes, primarily due to the absence of scene understanding. In this paper, we propose LIVE-GS, a highly realistic interactive VR system powered by LLM. After object-aware GS reconstruction, we prompt GPT-4o to analyze the physical properties of objects in the scene, which are used to guide physical simulations consistent with real phenomena. We also design a GPT-assisted GS inpainting module to fill the unseen area covered by manipulative objects. To perform a precise segmentation of Gaussian kernels, we propose a feature-mask segmentation strategy. To enable rich interaction, we further propose a computationally efficient physical simulation framework through an PBD-based unified interpolation method, supporting various physical forms such as rigid body, soft body, and granular materials. Our experimental results show that with the help of LLM's understanding and enhancement of scenes, our VR system can support complex and realistic interactions without additional manual design and annotation.

LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting

TL;DR

This work tackles real-time, physically plausible interaction in radiance-field VR by integrating language-model-based scene understanding with 3D Gaussian Splatting. Specifically, it treats scenes as a collection of Gaussian kernels with center and covariance , and uses GPT-4o to extract object- and particle-level properties that drive a unified PBD-based interpolation for rigid, soft, and granular dynamics. A GPT-assisted GS inpainting and a feature-mask segmentation strategy fill unseen regions and precisely segment kernels for interaction targets. Real-time VR demos demonstrate complex behaviors (e.g., a doll-like wolf, a breaking mug, bouncing balls) without manual annotation, illustrating seamless coupling between scene understanding, rendering, and physics. This approach promises scalable, language-guided VR asset creation with realistic interactivity.

Abstract

Recently, radiance field rendering, such as 3D Gaussian Splatting (3DGS), has shown immense potential in VR content creation due to its high-quality rendering and efficient production process. However, existing physics-based interaction systems for 3DGS can only perform simple and non-realistic simulations or demand extensive user input for complex scenes, primarily due to the absence of scene understanding. In this paper, we propose LIVE-GS, a highly realistic interactive VR system powered by LLM. After object-aware GS reconstruction, we prompt GPT-4o to analyze the physical properties of objects in the scene, which are used to guide physical simulations consistent with real phenomena. We also design a GPT-assisted GS inpainting module to fill the unseen area covered by manipulative objects. To perform a precise segmentation of Gaussian kernels, we propose a feature-mask segmentation strategy. To enable rich interaction, we further propose a computationally efficient physical simulation framework through an PBD-based unified interpolation method, supporting various physical forms such as rigid body, soft body, and granular materials. Our experimental results show that with the help of LLM's understanding and enhancement of scenes, our VR system can support complex and realistic interactions without additional manual design and annotation.

Paper Structure

This paper contains 16 sections, 15 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: System overview. Our system consists of three parts: scene reconstruction, scene enhancement and interactive framework. With original images and initial point clouds, we train Gaussian model with identity encoding and segment our targets through feature-mask segmentation. Afterwards, we leverage GPT to enhance our system's scene understanding by analyzing objects' properties and tracking possible artifacts caused by object removal. We import another fine-tuning stage to inpaint possible artifacts. Finally, we implement our GS-based VR system with Oculus SDK and Obi solver, achieving immersive real-time interactions.
  • Figure 2: Feature-mask segmentation.
  • Figure 3: Artifacts tracking. We input the source image, the removal image and related mask in GPT-4o to obtain color prompts. Then we track the artifacts with DEVA and intersect them with the mask, generating the final mask for 2D inpainting method.
  • Figure 4: Particle interpolation for Gaussian kernels. The rotation of the particles not only contributes to the rotation quaternion but also affects the transition vector.
  • Figure 5: Particles filling in container. six detections fail in the region above the surface but five detections can succeed. Filling particles find the nearest surface in the shrinkage and inherit their attributes.
  • ...and 6 more figures