Advances in Neural Rendering
Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, Tomas Simon, Christian Theobalt, Matthias Niessner, Jonathan T. Barron, Gordon Wetzstein, Michael Zollhoefer, Vladislav Golyanik
TL;DR
Neural rendering aims to synthesize photo-realistic imagery by learning 3D scene representations that integrate with differentiable image formation, yielding 3D-consistent novel-view synthesis. The field centers on neural radiance field (NeRF) paradigms and volumetric renderings that use coordinate-based MLPs to represent density and radiance, trained from 2D observations via differentiable rendering. This STAR surveys a broad landscape of scene representations (surfaces, volumes, implicit/explicit), rendering strategies (ray casting, rasterization), and optimization practices, then distills advances in static and dynamic view synthesis, generalization, editing, relighting, light fields, and engineering frameworks. The work highlights significant contributions like speedups (PlenOctrees, Instant-NGP), generalization via local/global conditioning and latent codes, and controllable dynamic NeRFs, while acknowledging open challenges in scalability, interpretability, and societal impact of photorealistic synthetic media. Overall, neural rendering is poised to transform content creation and visualization, offering strong 3D control from 2D data, but it also necessitates careful attention to ethics, robustness, and computational demands.
Abstract
Synthesizing photo-realistic images and videos is at the heart of computer graphics and has been the focus of decades of research. Traditionally, synthetic images of a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations of geometry and material properties as input. Collectively, these inputs define the actual scene and what is rendered, and are referred to as the scene representation (where a scene consists of one or more objects). Example scene representations are triangle meshes with accompanied textures (e.g., created by an artist), point clouds (e.g., from a depth sensor), volumetric grids (e.g., from a CT scan), or implicit surface functions (e.g., truncated signed distance fields). The reconstruction of such a scene representation from observations using differentiable rendering losses is known as inverse graphics or inverse rendering. Neural rendering is closely related, and combines ideas from classical computer graphics and machine learning to create algorithms for synthesizing images from real-world observations. Neural rendering is a leap forward towards the goal of synthesizing photo-realistic image and video content. In recent years, we have seen immense progress in this field through hundreds of publications that show different ways to inject learnable components into the rendering pipeline. This state-of-the-art report on advances in neural rendering focuses on methods that combine classical rendering principles with learned 3D scene representations, often now referred to as neural scene representations. A key advantage of these methods is that they are 3D-consistent by design, enabling applications such as novel viewpoint synthesis of a captured scene. In addition to methods that handle static scenes, we cover neural scene representations for modeling non-rigidly deforming objects...
