Table of Contents
Fetching ...

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman

TL;DR

Zip-NeRF integrates scale-aware anti-aliasing with grid-based NeRF representations to overcome jaggies and content gaps while preserving fast training. It combines mip-NeRF 360’s cone-based rendering with iNGP’s pyramid of grids using multisampling and downweighting to achieve spatial anti-aliasing, plus a smooth, prefiltered interlevel loss to suppress z-aliasing along rays. The approach yields up to 8–77% error reductions and a 24× speedup over mip-NeRF 360, with strong performance on both single-scale and multiscale 360 benchmarks. This work advances efficient NeRF training by explicitly addressing both spatial and ray-aliasing in grid-based architectures, enabling more robust view synthesis at scale.

Abstract

Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 77% lower than either prior technique, and that trains 24x faster than mip-NeRF 360.

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

TL;DR

Zip-NeRF integrates scale-aware anti-aliasing with grid-based NeRF representations to overcome jaggies and content gaps while preserving fast training. It combines mip-NeRF 360’s cone-based rendering with iNGP’s pyramid of grids using multisampling and downweighting to achieve spatial anti-aliasing, plus a smooth, prefiltered interlevel loss to suppress z-aliasing along rays. The approach yields up to 8–77% error reductions and a 24× speedup over mip-NeRF 360, with strong performance on both single-scale and multiscale 360 benchmarks. This work advances efficient NeRF training by explicitly addressing both spatial and ray-aliasing in grid-based architectures, enabling more robust view synthesis at scale.

Abstract

Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 77% lower than either prior technique, and that trains 24x faster than mip-NeRF 360.
Paper Structure (24 sections, 14 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 14 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Test-set images from the mip-NeRF 360 dataset barron2022mipnerf360 with renderings from (b) our model and (c, d, e) three state-of-the-art baselines. Our model accurately recovers thin structures and finely detailed foliage, while the baselines either oversmooth or exhibit aliasing in the form of jaggies and missing scene content. PSNR values for each patch are inset.
  • Figure 2: Here we show a toy 1-dimensional iNGP muller2022instant with 1 feature per scale. Each subplot represents a different strategy for querying the iNGP at all coordinates along the $x$ axis --- imagine a Gaussian moving left to right, where each line is the iNGP feature for each coordinate, and where each color is a different scale in the iNGP. (a) The naive solution of querying the Gaussian's mean results in features with piecewise-linear kinks, where the high frequencies past the bandwidth of the Gaussian are large and inaccurate. (b) The true solution, obtained by convolving the iNGP features with a Gaussian --- an intractable solution in practice --- results in coarse features that are smooth but informative and fine features that are near 0. (c) We can suppress unreliable high frequencies by downweighing them based on the scale of the Gaussian (color bands behind each feature indicate the downweighting), but this results in unnaturally sharp discontinuities in coarse features. (d) Alternatively, supersampling produces reasonable coarse scales features but erratic fine-scale features. (e) We therefore multisample isotropic sub-Gaussians (5 shown here) and use each sub-Gaussian's scale to downweight frequencies.
  • Figure 3: Here we show a toy 3D ray with an exaggerated pixel width (viewed along the ray as an inset) divided into 4 frustums denoted by color. We multisample each frustum with a hexagonal pattern that matches the frustum's first and second moments. Each pattern is rotated around the ray and flipped along the ray (a) randomly when training and (b) deterministically when rendering.
  • Figure 4: Here we visualize the problem of $z$-aliasing. Left: we have a scene where 2 training cameras face a narrow red chair in front of a gray wall. Middle: As we sweep a test camera between those training cameras, we see that the baseline algorithm (top) "misses" or "hits" the chair depending on its distance and therefore introduces tearing artifacts, while our model (bottom) consistently "hits" the chair to produce artifact-free renderings. Left: This is because the baseline (top) has learned non-smooth proposal distributions due to aliasing in its supervision, while our model (bottom) correctly predicts proposal distributions that capture both the foreground and the background at all depths due to our anti-aliased loss function.
  • Figure 5: Computing our anti-aliased loss requires that we smooth and resample a NeRF histogram $(\mathbf{s}, \mathbf{w})$ into the same set of endpoints as a proposal histogram $(\hat{\mathbf{s}}, \hat{\mathbf{w}})$, which we outline here. (1) We divide $\mathbf{w}$ by the size of each interval in $\mathbf{s}$ to yield a piecewise constant PDF that integrates to $\leq 1$. (2) We convolve that PDF with a rectangular pulse to obtain a piecewise linear PDF. (3) This PDF is integrated to produce a piecewise-quadratic CDF that is queried via piecewise quadratic interpolation at each location in $\hat{\mathbf{s}}$. (4) By taking the difference between adjacent interpolated values we obtain $\mathbf{w}^{\hat{\mathbf{s}}}$, which are the NeRF histogram weights $\mathbf{w}$ resampled into the endpoints of the proposal histogram $\hat{\mathbf{s}}$. (5) After resampling, we evaluate our loss $\mathcal{L}_{\mathrm{prop}}$ as an element-wise function of $\mathbf{w}^{\hat{\mathbf{s}}}$ and $\hat{\mathbf{w}}$, as they share a common coordinate space.
  • ...and 5 more figures