Table of Contents
Fetching ...

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu

TL;DR

To address redundancy introduced by the six real and six imag-inary direction-aware wavelet coefficients, this work introduces a trainable masking approach, mitigating storage issues without significant performance decline.

Abstract

Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D plane-based representations proves insufficient for re-rendering high-fidelity scenes with complex motions. In response, we present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes. Combining DaReNeRF with a tiny MLP for color regression and leveraging volume rendering in training yield state-of-the-art performance in novel view synthesis for complex dynamic scenes. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. Moreover, DaReNeRF maintains a 2x reduction in training time compared to prior art while delivering superior performance.

DaReNeRF: Direction-aware Representation for Dynamic Scenes

TL;DR

To address redundancy introduced by the six real and six imag-inary direction-aware wavelet coefficients, this work introduces a trainable masking approach, mitigating storage issues without significant performance decline.

Abstract

Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D plane-based representations proves insufficient for re-rendering high-fidelity scenes with complex motions. In response, we present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes. Combining DaReNeRF with a tiny MLP for color regression and leveraging volume rendering in training yield state-of-the-art performance in novel view synthesis for complex dynamic scenes. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. Moreover, DaReNeRF maintains a 2x reduction in training time compared to prior art while delivering superior performance.
Paper Structure (28 sections, 14 equations, 19 figures, 16 tables)

This paper contains 28 sections, 14 equations, 19 figures, 16 tables.

Figures (19)

  • Figure 1: DaReNeRF performance on dynamic 3D scenes. Our proposed direction-aware representation excels by capturing features of dynamic scenes from six different directions—a capability beyond the reach of traditional discrete-wavelet representations, cf. sub-figure (a). Built upon this advanced representation, our NeRF method outperforms prior work in challenging dynamic scenarios while being competitive in terms of training time and model size, offering the best trade-off overall, cf. sub-figure (b).
  • Figure 2: Method Overview.Top: The regular DaReNeRF architecture comprises an approximation and 12 direction-aware coefficient maps for both spatial (e.g., $XY$) and spatial-temporal (e.g., $ZT$) plane-based representations. To compute features of points in space-time, it multiplies feature vectors extracted from paired planes (e.g., $XY$ and $ZT$), concatenates the multiplied results into a single vector, and then multiplies them by learned tensor $V^{RF}$ for final results. RGB colors are regressed by a compact MLP, and images are synthesized via volumetric rendering. Bottom: The trainable mask is combined with the top architecture to create a sparse DaReNeRF. Each direction-aware representation and the approximation representation are assigned their own sparse masks. The sparse representation undergoes an inverse dual tree complex wavelet transform to obtain plane-based spatial and spatial-temporal representations.
  • Figure 3: Analysis Filter Bank, for the dual tree complex wavelet transform.
  • Figure 4: Visual Comparison on Dynamic Scenes (Plenoptic Data). K-Planes and HexPlane are concurrent decomposition-based methods. As shown in the four zoomed-in patches, our method better reconstruct fine details and captures motion. To see the figure animated, please view the document with compatible software, e.g., Adobe Acrobat or KDE Okular.
  • Figure 5: Visual Comparison of Static Scenes on NSVF Data. Two representative patches are selected for closer inspection. Our method, free from the DWT limitations of shift variance and direction ambiguity, achieves superior texture reconstruction performance.
  • ...and 14 more figures