Table of Contents
Fetching ...

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Yuning Gong, Yifei Liu, Yifan Zhan, Muyao Niu, Xueying Li, Yuanjun Liao, Jiaming Chen, Yuanyuan Gao, Jiaqi Chen, Minming Chen, Li Zhou, Yuning Zhang, Wei Wang, Xiaoqing Hou, Huaxi Huang, Shixiang Tang, Le Ma, Dingwen Zhang, Xue Yang, Junchi Yan, Yanchi Zhang, Yinqiang Zheng, Xiao Sun, Zhihang Zhong

TL;DR

Visionary tackles the deployment friction of real-time 3D Gaussian Splatting by delivering a web-native World Model Carrier that integrates per-frame ONNX inference with a WebGPU renderer. The Gaussian Generator contract and a lightweight three.js API enable plug-and-play 3DGS variants (including MLP-based 3DGS, 4DGS, and animatable avatars) with generative post-processing directly in the browser. Key contributions include a reference WebGPU implementation, contract-driven extensibility, and demonstrated end-to-end speedups over existing web viewers, facilitating reproduction and cross-method comparison. This work significantly lowers the barrier to in-browser neural rendering, supporting both reconstructive and generative world-model paradigms and paving the way for physics-aware, interactive 3D environments and embodied agents in client-side systems.

Abstract

Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

TL;DR

Visionary tackles the deployment friction of real-time 3D Gaussian Splatting by delivering a web-native World Model Carrier that integrates per-frame ONNX inference with a WebGPU renderer. The Gaussian Generator contract and a lightweight three.js API enable plug-and-play 3DGS variants (including MLP-based 3DGS, 4DGS, and animatable avatars) with generative post-processing directly in the browser. Key contributions include a reference WebGPU implementation, contract-driven extensibility, and demonstrated end-to-end speedups over existing web viewers, facilitating reproduction and cross-method comparison. This work significantly lowers the barrier to in-browser neural rendering, supporting both reconstructive and generative world-model paradigms and paving the way for physics-aware, interactive 3D environments and embodied agents in client-side systems.

Abstract

Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.

Paper Structure

This paper contains 35 sections, 6 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visionary as a Universal Runtime. Visionary’s core system is packaged as a three.js plug-in for seamless extension and integration. As a demonstration, we present a lightweight web-based editor that runs directly in the browser: by simply visiting a URL, users can leverage local computing resources through WebGPU to efficiently and simultaneously render multiple heterogeneous 3D/4D Gaussian assets, while maintaining full compatibility with traditional mesh-based rendering pipelines.
  • Figure 2:
  • Figure 3: Runtime comparison. Experiments are done under identical Gaussian complexity using the classic "bicycle" scene of 3DGS kerbl20233d (6M Gaussians at full resolution, and $1/2$, $1/4$, $1/8$ scales). (a): SparkJS shows a dominant CPU sorting cost. (b): Visionary shifts computation to GPU with low and stable overhead. (c): Log-scale comparison shows up to $\sim$100$\times$ speed-up over SparkJS.
  • Figure 4: Artifacts caused by lazy sorting in SparkJS. When rotating the viewpoint rapidly, the stale/incrementally updated order can become invalid, producing incorrect alpha compositing and visible temporal artifacts. The efficient implementation of Visioanry avoids this flaw. The version of SparkJS tested here is the latest available, v0.1.10.
  • Figure 5: Wrong visualization caused by local sorting in Supersplat. Without a global ordering, overlapping Gaussians across partitions may be composited in an incorrect order. Visionary's efficient global sorting avoids this issue. The version of Supersplat tested here is the latest available, v2.15.1.