Table of Contents
Fetching ...

Neural Point-Based Graphics

Kara-Ali Aliev, Artem Sevastopolsky, Maria Kolos, Dmitry Ulyanov, Victor Lempitsky

TL;DR

This work introduces neural point-based graphics that treat each point in a raw point cloud as carrying a learnable descriptor for local geometry and appearance, paired with a neural renderer to synthesize photorealistic views from novel viewpoints without surface reconstruction. A progressive multi-scale rendering framework and joint optimization of descriptors and the renderer enable robust view synthesis across diverse scenes, including those challenging for meshing. They demonstrate competitive or superior performance to mesh-based and neural rendering baselines on ScanNet, human portraits, and object scenes, with particular advantages on thin structures and incomplete geometries. The approach also supports scene editing and discusses practical considerations like anti-aliasing, signaling a scalable, mesh-free path for high-quality neural rendering from simple point primitives.

Abstract

We present a new point-based approach for modeling the appearance of real scenes. The approach uses a raw point cloud as the geometric representation of a scene, and augments each point with a learnable neural descriptor that encodes local geometry and appearance. A deep rendering network is learned in parallel with the descriptors, so that new views of the scene can be obtained by passing the rasterizations of a point cloud from new viewpoints through this network. The input rasterizations use the learned descriptors as point pseudo-colors. We show that the proposed approach can be used for modeling complex scenes and obtaining their photorealistic views, while avoiding explicit surface estimation and meshing. In particular, compelling results are obtained for scene scanned using hand-held commodity RGB-D sensors as well as standard RGB cameras even in the presence of objects that are challenging for standard mesh-based modeling.

Neural Point-Based Graphics

TL;DR

This work introduces neural point-based graphics that treat each point in a raw point cloud as carrying a learnable descriptor for local geometry and appearance, paired with a neural renderer to synthesize photorealistic views from novel viewpoints without surface reconstruction. A progressive multi-scale rendering framework and joint optimization of descriptors and the renderer enable robust view synthesis across diverse scenes, including those challenging for meshing. They demonstrate competitive or superior performance to mesh-based and neural rendering baselines on ScanNet, human portraits, and object scenes, with particular advantages on thin structures and incomplete geometries. The approach also supports scene editing and discusses practical considerations like anti-aliasing, signaling a scalable, mesh-free path for high-quality neural rendering from simple point primitives.

Abstract

We present a new point-based approach for modeling the appearance of real scenes. The approach uses a raw point cloud as the geometric representation of a scene, and augments each point with a learnable neural descriptor that encodes local geometry and appearance. A deep rendering network is learned in parallel with the descriptors, so that new views of the scene can be obtained by passing the rasterizations of a point cloud from new viewpoints through this network. The input rasterizations use the learned descriptors as point pseudo-colors. We show that the proposed approach can be used for modeling complex scenes and obtaining their photorealistic views, while avoiding explicit surface estimation and meshing. In particular, compelling results are obtained for scene scanned using hand-held commodity RGB-D sensors as well as standard RGB cameras even in the presence of objects that are challenging for standard mesh-based modeling.

Paper Structure

This paper contains 14 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Given a set of RGB views and a point cloud (top-left), our approach fits a neural descriptor to each point (top-middle), after which new views of a scene can be rendered (top-right). The method works for a variety of scenes including 3D portraits (top) and interiors (bottom).
  • Figure 2: An overview of our system. Given the point cloud $\mathbf{P}$ with neural descriptors $\mathbf{D}$ and camera parameters $C$, we rasterize the points with z-buffer at several resolutions, using descriptors as pseudo-colors. We then pass the rasterizations through the U-net-like rendering network to obtain the resulting image. Our model is fit to new scene(s) by optimizing the parameters of the rendering network and the neural descriptors by backpropagating the perceptual loss function.
  • Figure 3: Representative samples from the People dataset used in our experiments.
  • Figure 4: Comparative results on the 'Studio' dataset (from Dai17b). We show the textured mesh, the colored point cloud, the results of three neural rendering systems (including ours), and the ground truth. Our system can successfully reproduce details that pose challenge for meshing, such as the wheel of the bicycle.
  • Figure 5: Results on the holdout frames from the 'Person 1' and 'Person 2' scenes. Our approach successfully transfers fine details to new views.
  • ...and 4 more figures