Table of Contents
Fetching ...

City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

Kaiwen Song, Xiaoyi Zeng, Chenqu Ren, Juyong Zhang

TL;DR

City-on-Web tackles real-time neural rendering of large-scale scenes on the web by combining a block-based radiance-field representation with an explicit Level-of-Detail scheme and dynamic loading. It enables per-block shaders to render large environments while preserving 3D occlusion through depth-sorted alpha blending, and proves equivalence to traditional volume rendering under Lambertian assumptions. The approach yields real-time rendering at about $32$ FPS at 1080p on a consumer RTX 3060 with memory usage around $1100$ MB, while maintaining high perceptual quality (SSIM/LPIPS) and competitive PSNR on urban datasets, via training with a finest-LOD model and progressive LOD baking. The work demonstrates practical web deployment of large-scale neural rendering, offering significant memory savings, scalable rendering, and a thorough experimental study, while acknowledging limitations related to lighting variation and non-Lambertian effects that motivate future improvements.

Abstract

Existing neural radiance field-based methods can achieve real-time rendering of small scenes on the web platform. However, extending these methods to large-scale scenes still poses significant challenges due to limited resources in computation, memory, and bandwidth. In this paper, we propose City-on-Web, the first method for real-time rendering of large-scale scenes on the web. We propose a block-based volume rendering method to guarantee 3D consistency and correct occlusion between blocks, and introduce a Level-of-Detail strategy combined with dynamic loading/unloading of resources to significantly reduce memory demands. Our system achieves real-time rendering of large-scale scenes at approximately 32FPS with RTX 3060 GPU on the web and maintains rendering quality comparable to the current state-of-the-art novel view synthesis methods.

City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web

TL;DR

City-on-Web tackles real-time neural rendering of large-scale scenes on the web by combining a block-based radiance-field representation with an explicit Level-of-Detail scheme and dynamic loading. It enables per-block shaders to render large environments while preserving 3D occlusion through depth-sorted alpha blending, and proves equivalence to traditional volume rendering under Lambertian assumptions. The approach yields real-time rendering at about FPS at 1080p on a consumer RTX 3060 with memory usage around MB, while maintaining high perceptual quality (SSIM/LPIPS) and competitive PSNR on urban datasets, via training with a finest-LOD model and progressive LOD baking. The work demonstrates practical web deployment of large-scale neural rendering, offering significant memory savings, scalable rendering, and a thorough experimental study, while acknowledging limitations related to lighting variation and non-Lambertian effects that motivate future improvements.

Abstract

Existing neural radiance field-based methods can achieve real-time rendering of small scenes on the web platform. However, extending these methods to large-scale scenes still poses significant challenges due to limited resources in computation, memory, and bandwidth. In this paper, we propose City-on-Web, the first method for real-time rendering of large-scale scenes on the web. We propose a block-based volume rendering method to guarantee 3D consistency and correct occlusion between blocks, and introduce a Level-of-Detail strategy combined with dynamic loading/unloading of resources to significantly reduce memory demands. Our system achieves real-time rendering of large-scale scenes at approximately 32FPS with RTX 3060 GPU on the web and maintains rendering quality comparable to the current state-of-the-art novel view synthesis methods.
Paper Structure (23 sections, 16 equations, 9 figures, 5 tables)

This paper contains 23 sections, 16 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of City-on-Web pipeline. During the training phase, we uniformly partition the scene and reconstruct it at the finest LOD. To ensure 3D consistency, we use a resource-independent block-based volume rendering strategy (\ref{['sec:training']}). For LOD generation, we downsample virtual grid points and retrain a coarser model (\ref{['sec:lod']}). This approach supports subsequent real-time rendering by facilitating the dynamic loading of rendering resources.
  • Figure 2: Visualization comparison between the alpha blending method and others. (a) Top image: incorrect occlusion without depth sorting. Bottom image: incorrect rendering results when simply using $\alpha_i/(\sum_j \alpha_{j})$ as blending weights. (b) Left: rendering results of four separate blocks and the final blending result. Right: visualization of sample points' rendering weights before and after alpha blending.
  • Figure 3: Block-based volume rendering. "DR" denotes deferred rendering. $\Phi$ represents the deferred MLP.
  • Figure 4: Qualitative comparisons with existing SOTA methods. By testing different methods across diverse scales and environments, it clearly reveals that our approach excels in recovering finer details and achieves a higher quality of reconstruction.
  • Figure 5: Visualization of our LOD result.
  • ...and 4 more figures