Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Hongchi Xia; Zhi-Hao Lin; Wei-Chiu Ma; Shenlong Wang

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

TL;DR

Video2Game presents an end-to-end pipeline that turns a video of a real-world scene into a real-time, interactive, browser-accessible virtual environment. It fuses large-scale NeRF rendering with a subsequent NeRF-to-mesh baking stage that preserves quality while enabling rapid, WebGL-based rendering, and then attaches a physics module by decomposing the scene into actionable objects with rigid-body dynamics. The key contributions include (a) large-scale NeRF with semantic and geometric regularization for unbounded scenes, (b) a mesh baking framework with neural textures for game engines, and (c) a physics-aware representation enabling interactive navigation, manipulation, and robot simulation in browser contexts. This approach enables accessible, high-fidelity digital twins suitable for gaming, simulators, and robotic training, with demonstrated browser performance and cross-platform export capabilities.

Abstract

Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top.

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

TL;DR

Abstract

Paper Structure (60 sections, 4 equations, 12 figures, 12 tables)

This paper contains 60 sections, 4 equations, 12 figures, 12 tables.

Introduction
Related Works
Video2Game
Large-scale NeRF
Preliminaries:
Large-scale NeRF:
Depth:
Surface normals:
Semantics:
Regularization:
Blocking:
Learning:
NeRF Baking
Mesh representation:
Rendering:
...and 45 more sections

Figures (12)

Figure 1: Video2Game takes an input video of an arbitrary scene and automatically transforms it into a real-time, interactive, realistic and browser-compatible environment. The users can freely explore the environment and interact with the objects in the scene.
Figure 2: Overview of Video2Game: Given multiple posed images from a single video as input, we first construct a large-scale NeRF model that is realistic and possesses high-quality surface geometry. We then transform this NeRF model into a mesh representation with corresponding rigid-body dynamics to enable interactions. We utilize UV-mapped neural texture, which is both expressive and compatible with game engines. Finally, we obtain an interactive virtual environment that virtual actors can interact with, can respond to user control, and deliver high-resolution rendering from novel camera perspectives -- all in real-time.
Figure 3: Visualization of automatically computed collision geometry: Sphere collider (green), box collider (yellow), convex polygon collider (purple) and trimesh collider (red).
Figure 4: Qualitative comparisons among NeRF models. The rendering quality of our base NeRF is superior to baselines, and with leveraging monocular cues, we substantially improve rendered geometry compared to other baselines. This significantly facilitates NeRF baking in subsequent stages. Here we consider depths measured by LiDAR point cloud in KITTI-360 and compute normals based on it.
Figure 5: Qualitative comparisons among mesh models. We compare our mesh rendering method with others in Garden scene barron2022mip.
...and 7 more figures

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

TL;DR

Abstract

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Authors

TL;DR

Abstract

Table of Contents

Figures (12)