Table of Contents
Fetching ...

Neural 4D Evolution under Large Topological Changes from 2D Images

AmirHossein Naghi Razlighi, Tiago Novello, Asen Nachkov, Thomas Probst, Danda Paudel

TL;DR

A new architecture to discretize and encode the deformation and learn the SDF and a technique to impose the temporal consistency are introduced and a rendering scheme for color prediction based on Gaussian splatting is proposed.

Abstract

In the literature, it has been shown that the evolution of the known explicit 3D surface to the target one can be learned from 2D images using the instantaneous flow field, where the known and target 3D surfaces may largely differ in topology. We are interested in capturing 4D shapes whose topology changes largely over time. We encounter that the straightforward extension of the existing 3D-based method to the desired 4D case performs poorly. In this work, we address the challenges in extending 3D neural evolution to 4D under large topological changes by proposing two novel modifications. More precisely, we introduce (i) a new architecture to discretize and encode the deformation and learn the SDF and (ii) a technique to impose the temporal consistency. (iii) Also, we propose a rendering scheme for color prediction based on Gaussian splatting. Furthermore, to facilitate learning directly from 2D images, we propose a learning framework that can disentangle the geometry and appearance from RGB images. This method of disentanglement, while also useful for the 4D evolution problem that we are concentrating on, is also novel and valid for static scenes. Our extensive experiments on various data provide awesome results and, most importantly, open a new approach toward reconstructing challenging scenes with significant topological changes and deformations. Our source code and the dataset are publicly available at https://github.com/insait-institute/N4DE.

Neural 4D Evolution under Large Topological Changes from 2D Images

TL;DR

A new architecture to discretize and encode the deformation and learn the SDF and a technique to impose the temporal consistency are introduced and a rendering scheme for color prediction based on Gaussian splatting is proposed.

Abstract

In the literature, it has been shown that the evolution of the known explicit 3D surface to the target one can be learned from 2D images using the instantaneous flow field, where the known and target 3D surfaces may largely differ in topology. We are interested in capturing 4D shapes whose topology changes largely over time. We encounter that the straightforward extension of the existing 3D-based method to the desired 4D case performs poorly. In this work, we address the challenges in extending 3D neural evolution to 4D under large topological changes by proposing two novel modifications. More precisely, we introduce (i) a new architecture to discretize and encode the deformation and learn the SDF and (ii) a technique to impose the temporal consistency. (iii) Also, we propose a rendering scheme for color prediction based on Gaussian splatting. Furthermore, to facilitate learning directly from 2D images, we propose a learning framework that can disentangle the geometry and appearance from RGB images. This method of disentanglement, while also useful for the 4D evolution problem that we are concentrating on, is also novel and valid for static scenes. Our extensive experiments on various data provide awesome results and, most importantly, open a new approach toward reconstructing challenging scenes with significant topological changes and deformations. Our source code and the dataset are publicly available at https://github.com/insait-institute/N4DE.

Paper Structure

This paper contains 28 sections, 18 equations, 21 figures, 3 tables.

Figures (21)

  • Figure 1: Task scheme. Our method learns the deformation animation of an object between two frames with large topological changes between them.
  • Figure 2: Architecture of the SDF module. Each point $\textbf{x} \in [0,1]^3$ is encoded using (a) HashGrid which is presented in Section \ref{['sec:sdf_head']}. Then, the coordinate encoding (of dimension $F \times L$) are concatenated with the positional encoding of time ($\gamma(t)$) and fed into a MLP (SDF Head). (b) Signed distance value for each point is estimated, and the Lagrangian representation of mesh is extracted via Marching Cubes marching-cubes. As our experiments show, the model can learn continuous representation of the deformation with respect to time.
  • Figure 3: Overall pipeline for training and inference with the rendering module. In each iteration, the surface points estimated by the SDF Head are extracted via marching cubes marching-cubes. They are then encoded via a HashGrid encoder instant-ngp and the time embedding (via positional encoding) is concatenated to them. These features go through the rendering module to estimate the splat's appearance properties (excluding for position and scale ). At inference time, we use the final geometry and splat properties to do color interpolation and render the colored geometry.
  • Figure 4: Overview of the proposed pipeline, applied on a Stanford bunny. A geometric/appearance representation is extracted using the method presented in Sec \ref{['sec:sdf_head']}/Sec \ref{['sec:rendering_head']}. The final colored mesh is given by the combination of these two representations.
  • Figure 5: Estimated meshes at different timesteps for the Static Bracelet scene. The scene is only supervised at $t=0$ and by the effect of the $\frac{\partial S_{\theta}}{\partial t}$ regularizer, it learns to be constant along time.
  • ...and 16 more figures