Table of Contents
Fetching ...

WavePlanes: Compact Hex Planes for Dynamic Novel View Synthesis

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

TL;DR

WavePlanes is a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods, that exploits the sparsity of wavelet coefficients by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map.

Abstract

Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done previously), we use the inverse discrete wavelet transform to reconstruct features at varying scales. This leads to a more compact representation and allows us to explore wavelet-based compression schemes for further gains. The proposed compression scheme exploits the sparsity of wavelet coefficients, by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map. Compared to the state-of-the-art (SotA), WavePlanes is significantly smaller, less resource demanding and competitive in reconstruction quality. Compared to small SotA models, WavePlanes outperforms methods in both model size and quality of novel views.

WavePlanes: Compact Hex Planes for Dynamic Novel View Synthesis

TL;DR

WavePlanes is a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods, that exploits the sparsity of wavelet coefficients by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map.

Abstract

Dynamic Novel View Synthesis (Dynamic NVS) enhances NVS technologies to model moving 3-D scenes. However, current methods are resource intensive and challenging to compress. To address this, we present WavePlanes, a fast and more compact hex plane representation, applicable to both Neural Radiance Fields and Gaussian Splatting methods. Rather than modeling many feature scales separately (as done previously), we use the inverse discrete wavelet transform to reconstruct features at varying scales. This leads to a more compact representation and allows us to explore wavelet-based compression schemes for further gains. The proposed compression scheme exploits the sparsity of wavelet coefficients, by applying hard thresholding to the wavelet planes and storing nonzero coefficients and their locations on each plane in a Hash Map. Compared to the state-of-the-art (SotA), WavePlanes is significantly smaller, less resource demanding and competitive in reconstruction quality. Compared to small SotA models, WavePlanes outperforms methods in both model size and quality of novel views.
Paper Structure (10 sections, 4 equations, 6 figures, 4 tables)

This paper contains 10 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Left: Visual and quantitative comparison of our method applied to K-Planes fridovich2023k and 4D-GS wu20244dgs. Right: Quantitative comparison of model size (point radius), training time (x-axis) and quality (y-axis) on the entire D-NeRF data setpumarola2021d
  • Figure 2: Model Pipeline: (a) N-level wavelet planes are transformed using the IDWT into feature planes. (b) 4-D samples are projected onto each plane and bi-linearly interpolated over multiple scales. (c) Volumetric features are recovered by fusing the space-only and space-time features. (d) Features are linearly decoded into color and density values. (e) A 2-D pixel is rendered from the 3-D volume using the NeRF volumetric rendering function. (f) A loss and weighted regularization is used for training. (g) After training we compress nonzero wavelet coefficients
  • Figure 3: Qualitative real video results on the Cooked Salmon DyNeRF scene li2022neural. $2\times$ and $8\times$ indicates the downsampling factor used for the IST weights. Due to limited RAM we use $8\times$ downsampling
  • Figure 4: Zoomed visual comparisons of the Lego scene pumarola2021d with PSNR / SSIM / model size
  • Figure 5: Zoomed visual comparisons of small NeRFs on fast motion
  • ...and 1 more figures