Table of Contents
Fetching ...

INPC: Implicit Neural Point Clouds for Radiance Field Rendering

Florian Hahlbohm, Linus Franke, Moritz Kappel, Susana Castillo, Martin Eisemann, Marc Stamminger, Marcus Magnor

TL;DR

This work introduces Implicit Neural Point Clouds (INPC), a hybrid representation that encodes geometry with an octree-based point probability field and appearance with a multi-resolution hash grid, enabling extraction of explicit point clouds and fast rasterization-based rendering for unbounded scenes. By combining view-specific and view-independent sampling, differentiable bilinear splatting, and an end-to-end optimization pipeline with robust and perceptual losses, INPC achieves state-of-the-art perceptual image quality on challenging benchmarks while offering interactive rendering speeds on consumer hardware. The approach avoids heavy ray-marching and explicit priors, yet preserves geometric detail, and can convert trained models into dense explicit point clouds to further boost performance. Extensive experiments, including ablations and a perceptual study against Zip-NeRF, demonstrate improved visual fidelity and robust optimization, highlighting the practical impact of a scalable, hybrid radiance-field representation.

Abstract

We introduce a new approach for reconstruction and novel view synthesis of unbounded real-world scenes. In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxies, we propose a hybrid scene representation, which implicitly encodes the geometry in a continuous octree-based probability field and view-dependent appearance in a multi-resolution hash grid. This allows for extraction of arbitrary explicit point clouds, which can be rendered using rasterization. In doing so, we combine the benefits of both worlds and retain favorable behavior during optimization: Our novel implicit point cloud representation and differentiable bilinear rasterizer enable fast rendering while preserving the fine geometric detail captured by volumetric neural fields. Furthermore, this representation does not depend on priors like structure-from-motion point clouds. Our method achieves state-of-the-art image quality on common benchmarks. Furthermore, we achieve fast inference at interactive frame rates, and can convert our trained model into a large, explicit point cloud to further enhance performance.

INPC: Implicit Neural Point Clouds for Radiance Field Rendering

TL;DR

This work introduces Implicit Neural Point Clouds (INPC), a hybrid representation that encodes geometry with an octree-based point probability field and appearance with a multi-resolution hash grid, enabling extraction of explicit point clouds and fast rasterization-based rendering for unbounded scenes. By combining view-specific and view-independent sampling, differentiable bilinear splatting, and an end-to-end optimization pipeline with robust and perceptual losses, INPC achieves state-of-the-art perceptual image quality on challenging benchmarks while offering interactive rendering speeds on consumer hardware. The approach avoids heavy ray-marching and explicit priors, yet preserves geometric detail, and can convert trained models into dense explicit point clouds to further boost performance. Extensive experiments, including ablations and a perceptual study against Zip-NeRF, demonstrate improved visual fidelity and robust optimization, highlighting the practical impact of a scalable, hybrid radiance-field representation.

Abstract

We introduce a new approach for reconstruction and novel view synthesis of unbounded real-world scenes. In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxies, we propose a hybrid scene representation, which implicitly encodes the geometry in a continuous octree-based probability field and view-dependent appearance in a multi-resolution hash grid. This allows for extraction of arbitrary explicit point clouds, which can be rendered using rasterization. In doing so, we combine the benefits of both worlds and retain favorable behavior during optimization: Our novel implicit point cloud representation and differentiable bilinear rasterizer enable fast rendering while preserving the fine geometric detail captured by volumetric neural fields. Furthermore, this representation does not depend on priors like structure-from-motion point clouds. Our method achieves state-of-the-art image quality on common benchmarks. Furthermore, we achieve fast inference at interactive frame rates, and can convert our trained model into a large, explicit point cloud to further enhance performance.
Paper Structure (38 sections, 9 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 38 sections, 9 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Novel views synthesized by our model and three state-of-the-art baselines. Our implicit point cloud optimization excels at capturing fine detail leading to a higher visual fidelity compared to baselines. While outperformed by explicit point-based methods kerbl3Dgaussiansfranke2024trips in terms of inference frame rates, our model renders 17$\times$ faster than Zip-NeRF barron2023ICCV. Per-patch PSNR and per-scene fps values are inset.
  • Figure 2: Overview of our method: We introduce the implicit point cloud, a combination of a point probability field stored in an octree and implicitly stored appearance features. To render an image for a given viewpoint, we sample the representation by estimating point positions and querying the multi-resolution hash grid for per-point features. This explicit point cloud -- together with a small background MLP -- is then rendered with a bilinear point splatting module and post-processed by a CNN. During optimization, the neural networks as well as the implicit point cloud are optimized, efficiently reconstructing the scene.
  • Figure 3: Visual comparisons for ablations H, D, and J in \ref{['tab:ablation_tables_tab']}. Our background model prevents the sampling of points in the sky. Disabling octree subdivision causes foreground reconstruction to fail. Omitting post-processing (\ref{['ssec:neural_pp_network']}) leads to holes and high-frequency noise in renderings.
  • Figure 4: Visual comparison of INPC configurations. Our global pre-extraction slightly reduces visual quality in terms of fine detail, especially in FHD renderings of Tanks and Temples scenes. Without multisampling images are slightly sharper but our sampling sometimes misses thin structures. For Mip-NeRF360 scenes, the difference between our 8M and default (33M) configuration is barely visible.
  • Figure 5: Limitations. Our method and state-of-the-art baselines sometimes fail to recover fine geometric detail near the camera.
  • ...and 7 more figures