Table of Contents
Fetching ...

RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields

Doriand Petit, Steve Bourgeois, Dumitru Pavel, Vincent Gay-Bellile, Florian Chabot, Loic Barthe

TL;DR

RING-NeRF introduces two inductive biases—a continuous multi-scale scene representation and a decoder latent-space invariant to space and scale—aiming to address NeRF limitations without task-specific supervision. It couples a distance-aware forward mapping in contracted space with a continuous coarse-to-fine optimization, yielding an unbounded, adaptive level of detail (LOD) representation. Across novel view synthesis, anti-aliasing, few-view supervision, and SDF reconstruction without initialization, it achieves competitive or superior quality with improved efficiency and robustness, while enabling dynamic resolution extents by adding grid levels without retraining the decoder. This work suggests a simple, bias-driven alternative to bespoke architectures and points toward memory-efficient, sparse neural fields for unbounded scenes.

Abstract

Recent advances in Neural Fields mostly rely on developing task-specific supervision which often complicates the models. Rather than developing hard-to-combine and specific modules, another approach generally overlooked is to directly inject generic priors on the scene representation (also called inductive biases) into the NeRF architecture. Based on this idea, we propose the RING-NeRF architecture which includes two inductive biases : a continuous multi-scale representation of the scene and an invariance of the decoder's latent space over spatial and scale domains. We also design a single reconstruction process that takes advantage of those inductive biases and experimentally demonstrates on-par performances in terms of quality with dedicated architecture on multiple tasks (anti-aliasing, few view reconstruction, SDF reconstruction without scene-specific initialization) while being more efficient. Moreover, RING-NeRF has the distinctive ability to dynamically increase the resolution of the model, opening the way to adaptive reconstruction.

RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields

TL;DR

RING-NeRF introduces two inductive biases—a continuous multi-scale scene representation and a decoder latent-space invariant to space and scale—aiming to address NeRF limitations without task-specific supervision. It couples a distance-aware forward mapping in contracted space with a continuous coarse-to-fine optimization, yielding an unbounded, adaptive level of detail (LOD) representation. Across novel view synthesis, anti-aliasing, few-view supervision, and SDF reconstruction without initialization, it achieves competitive or superior quality with improved efficiency and robustness, while enabling dynamic resolution extents by adding grid levels without retraining the decoder. This work suggests a simple, bias-driven alternative to bespoke architectures and points toward memory-efficient, sparse neural fields for unbounded scenes.

Abstract

Recent advances in Neural Fields mostly rely on developing task-specific supervision which often complicates the models. Rather than developing hard-to-combine and specific modules, another approach generally overlooked is to directly inject generic priors on the scene representation (also called inductive biases) into the NeRF architecture. Based on this idea, we propose the RING-NeRF architecture which includes two inductive biases : a continuous multi-scale representation of the scene and an invariance of the decoder's latent space over spatial and scale domains. We also design a single reconstruction process that takes advantage of those inductive biases and experimentally demonstrates on-par performances in terms of quality with dedicated architecture on multiple tasks (anti-aliasing, few view reconstruction, SDF reconstruction without scene-specific initialization) while being more efficient. Moreover, RING-NeRF has the distinctive ability to dynamically increase the resolution of the model, opening the way to adaptive reconstruction.
Paper Structure (33 sections, 6 equations, 19 figures, 19 tables)

This paper contains 33 sections, 6 equations, 19 figures, 19 tables.

Figures (19)

  • Figure 1: Different LODs Outputs of the model when only trained on the last level $L=7$. We observe that, even without supervising intermediate LODs during the training, a notion of LOD is captured in the scene reconstruction. We also observe visually continuous LOD since, as expected, the level $L=3.5$ outputs a 3D representation in between LOD $L=3$ and $L=4$ in term of details.
  • Figure 2: Overview of RING-NeRF: to render a pixel, the casted cone is sampled with cubes. Depending on the cube volume, the corresponding LOD of the scene is selected and the latent feature is computed using a weighted sum of the grid hierarchy. The density (or SDF) and color of the cube are first decoded from the latent feature with a tiny MLP and then integrated with other samples through volume rendering.
  • Figure 2: Different LODs Outputs of the model when only trained on the last level $L=7$. This illustrates the LOD inductive biases even on more complex scenes than DTU single objects.
  • Figure 3: We demonstrate the LOD inductive bias by training our model with a hierarchy of $N=7$ levels where only the last level of the mapping function $G_N$ is supervised. We then compute renders at different levels $L \leq N$. Other examples (including of entire scenes) can be found in the supplementary materials.
  • Figure 3: Comparison of LOD when supervising (a) every level of detail or (b) solely the finest level of detail
  • ...and 14 more figures