Table of Contents
Fetching ...

DaRePlane: Direction-aware Representations for Dynamic Scene Reconstruction

Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Meng Zheng, Terrence Chen, Ziyan Wu, Jack Noble

TL;DR

DaRePlane is a novel direction-aware representation approach that captures scene dynamics from six different directions that yields state-of-the-art performance in novel view synthesis for various complex dynamic scenes.

Abstract

Numerous recent approaches to modeling and re-rendering dynamic scenes leverage plane-based explicit representations, addressing slow training times associated with models like neural radiance fields (NeRF) and Gaussian splatting (GS). However, merely decomposing 4D dynamic scenes into multiple 2D plane-based representations is insufficient for high-fidelity re-rendering of scenes with complex motions. In response, we present DaRePlane, a novel direction-aware representation approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. Within NeRF pipelines, DaRePlane computes features for each space-time point by fusing vectors from these recovered planes, then passed to a tiny MLP for color regression. When applied to Gaussian splatting, DaRePlane computes the features of Gaussian points, followed by a tiny multi-head MLP for spatial-time deformation prediction. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. To demonstrate the generality and efficiency of DaRePlane, we test it on both regular and surgical dynamic scenes, for both NeRF and GS systems. Extensive experiments show that DaRePlane yields state-of-the-art performance in novel view synthesis for various complex dynamic scenes.

DaRePlane: Direction-aware Representations for Dynamic Scene Reconstruction

TL;DR

DaRePlane is a novel direction-aware representation approach that captures scene dynamics from six different directions that yields state-of-the-art performance in novel view synthesis for various complex dynamic scenes.

Abstract

Numerous recent approaches to modeling and re-rendering dynamic scenes leverage plane-based explicit representations, addressing slow training times associated with models like neural radiance fields (NeRF) and Gaussian splatting (GS). However, merely decomposing 4D dynamic scenes into multiple 2D plane-based representations is insufficient for high-fidelity re-rendering of scenes with complex motions. In response, we present DaRePlane, a novel direction-aware representation approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. Within NeRF pipelines, DaRePlane computes features for each space-time point by fusing vectors from these recovered planes, then passed to a tiny MLP for color regression. When applied to Gaussian splatting, DaRePlane computes the features of Gaussian points, followed by a tiny multi-head MLP for spatial-time deformation prediction. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. To demonstrate the generality and efficiency of DaRePlane, we test it on both regular and surgical dynamic scenes, for both NeRF and GS systems. Extensive experiments show that DaRePlane yields state-of-the-art performance in novel view synthesis for various complex dynamic scenes.

Paper Structure

This paper contains 41 sections, 17 equations, 23 figures, 23 tables.

Figures (23)

  • Figure 1: Performance of dynamic NeRF and Gaussian splatting (GS) with DaRePlane on 4D scenes. Our direction-aware representation excels by capturing features of dynamic scenes from six different directions—a capability beyond the reach of traditional discrete-wavelet representations, c.f. sub-figure (a). Built upon this advanced representation, our NeRF method first introduced in lou2024darenerf outperforms prior work in challenging 4D scenarios while being competitive in terms of training time and model size, offering the best trade-off overall, c.f. sub-figure (b). Similar results for our GS solution are shared in Figure \ref{['fig:GS_dareplane']}.
  • Figure 2: Method Overview.(a) In the given sequence of images, NeRF and GS initialize the spatial-temporal points and a set of 3D Gaussians, respectively. Voxel features of these points (for NeRF) or Gaussians (for GS) are then computed by querying voxel planes in DaRePlane. These features are subsequently fed into the volumetric rendering process (for NeRF) or the splatting process (for GS) to synthesize the final images. Bottom:(b) NeRF: Feature vectors queried from DaRePlane are concatenated into a single vector, and then multiplies them by learned tensor $V^{RF}$ for final results. RGB colors are regressed by a compact MLP, and images are synthesized via volumetric rendering. (c) GS: The concatenated feature vector is decoded using a multi-head deformation decoder to obtain the deformation of Gaussians at a specific timestamp $t$. These deformed Gaussians are then splatted to render the final images.
  • Figure 3: Analysis Filter Bank, for the dual tree complex wavelet transform.
  • Figure 4: DaRePlane and DaRePlane-S Overview.Top: The regular DaRePlane architecture comprises an approximation and 12 direction-aware coefficient maps for both spatial (e.g., $XY$) and spatial-temporal (e.g., $ZT$) plane-based representation. To compute the features of points in space-time, it multiplies feature vectors extracted from paired planes (e.g., $XY$ and $ZT$). Bottom: The trainable mask is combined with the top architecture to create DaRePlane-S. Each direction-aware representation and the approximation representation are assigned their own sparse masks. The sparse representation undergoes an inverse dual tree complex wavelet transform to obtain plane-based spatial and spatial-temporal representations.
  • Figure 5: Visual Comparison on Dynamic Scenes (Plenoptic Data). K-Planes and HexPlane are concurrent decomposition-based methods. As shown in the four zoomed-in patches, our method better reconstruct fine details and captures motion. Please refer to the supplementary material to see the figure animated.
  • ...and 18 more figures