Table of Contents
Fetching ...

XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold

Guangyu Wang, Jinzhi Zhang, Fan Wang, Ruqi Huang, Lu Fang

TL;DR

XScale-NVS tackles cross-scale novel view synthesis for real-world large-scale scenes by introducing a hash featurized manifold that concentrates multi-resolution hash features on the 2D surface, paired with a deferred neural renderer. The method rasterizes surface-aligned hash features into screen space, then decodes them with a lightweight MLP shader, complemented by surface multisampling and a latent-space manifold deformation to ensure multi-view consistency and detail preservation. Key contributions include the hash featurized manifold representation, a rasterization-based rendering pipeline, two cross-scale enhancements, and the GigaNVS dataset for real-world, cross-scale, high-resolution NVS benchmarks, which yield substantial improvements (notably $40\%$ LPIPS reduction on GigaNVS) over prior baselines. This work advances scalable, high-fidelity rendering of large scenes with micro-scale details, enabling more realistic virtual visual experiences and robust cross-scale perception in real-world contexts.

Abstract

We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of realworld large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark. Please see our project page at: xscalenvs.github.io.

XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold

TL;DR

XScale-NVS tackles cross-scale novel view synthesis for real-world large-scale scenes by introducing a hash featurized manifold that concentrates multi-resolution hash features on the 2D surface, paired with a deferred neural renderer. The method rasterizes surface-aligned hash features into screen space, then decodes them with a lightweight MLP shader, complemented by surface multisampling and a latent-space manifold deformation to ensure multi-view consistency and detail preservation. Key contributions include the hash featurized manifold representation, a rasterization-based rendering pipeline, two cross-scale enhancements, and the GigaNVS dataset for real-world, cross-scale, high-resolution NVS benchmarks, which yield substantial improvements (notably LPIPS reduction on GigaNVS) over prior baselines. This work advances scalable, high-fidelity rendering of large scenes with micro-scale details, enabling more realistic virtual visual experiences and robust cross-scale perception in real-world contexts.

Abstract

We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of realworld large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark. Please see our project page at: xscalenvs.github.io.
Paper Structure (10 sections, 5 equations, 5 figures, 6 tables)

This paper contains 10 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: We propose hash featurized manifold representation for high-fidelity cross-scale neural rendering of real-world large-scale scenes. Compared to the recent advances 10274871kerbl20233dbarron2023zip, our method synthesizes novel views with unprecedented levels of realism. Please zoom in to see the high-quality details.
  • Figure 2: Illustration of different featurizations. The curved triangle represents a micro surface patch in 3D with rich texture details, which can be reflected by the close-up imagery of a real-world large scene. Let the bold purple dot represent the target pixel colour $\boldsymbol{c}$, and we denote by the black arrow $\boldsymbol{d}$ the view direction of the camera ray. (a) UV-based featurizations 10274871thies2019deferredliu2023real tend to disorganize the feature distribution due to distortions floater2005surfaceHormann2007MeshPTRay2006PeriodicGP in surface parametrization $\psi$. (b) Existing 3D-surface-based featurizations aliev2020neuralruckert2022adopzuo2022viewrakhimov2022npbg++kopanas2022neuralyang2022neumeshkerbl20233d fail to express the sub-primitive-scale intricate details given the limited discretization resolution. (c) Volumetric featurizations muller2022instantli2023neuralangelorosu2023permutosdfwang2023neus2barron2023zip inevitably yield a dispersed weight distribution during volume rendering, where many multi-view inconsistent yet highly weighted samples ambiguate surface colour and deteriorate surface features with inconsistent colour gradient. (d) Our method leverages hash encoding to unlock the dependence of featuremetric resolution on discretization resolution, and utilizes rasterization to fully unleash the expressivity of volumetric hash encoding by propagating clean and multi-view consistent signals to surface features.
  • Figure 3: An overview of the hash featurized manifold representation and our neural rendering framework. (a) We first reconstruct the scene as a mesh using MVS and featurize the surface manifold with volumetric multi-resolution hash encoding. (b) We then rasterize the featurized manifold into screen space and (c) optionally perform surface multisampling and manifold deformation to express a deformable frustum for a better representation of the cross-scale details. (d) An MLP-based neural shader decodes the rasterized feature buffer and account for the view dependent colour. Remarkably, we leverage rasterization to concentrate the featurization on multi-view consistency throughout the optimization, inherently converting the redundant volumetric featurization into an expressive surface-based featurization.
  • Figure 4: Illustration of the real-captured, unstructured, cross-scale imagery in our GigaNVS dataset. We collect high quality multi-view images at varying distances ranging from $5m$ to $10^3m$.
  • Figure 5: Novel view synthesis results on the GigaNVS dataset. Compared to 10274871kerbl20233dbarron2023zip, our method robustly synthesizes realistic colour and intricate details, preserving approximately the input-level resolution. Please zoom-in to see the details.