Table of Contents
Fetching ...

Differentiable Rendering: A Survey

Hiroharu Kato, Deniz Beker, Mihai Morariu, Takahiro Ando, Toru Matsuoka, Wadim Kehl, Adrien Gaidon

TL;DR

This survey maps the landscape of differentiable rendering by categorizing methods according to underlying 3D representations (mesh, voxel, point cloud, implicit), detailing analytical and approximated gradient strategies, and highlighting global illumination approaches. It surveys applications from single-view object and human reconstruction to adversarial scenarios and data labeling, and reviews major libraries (TensorFlow Graphics, Kaolin, PyTorch3D, Mitsuba 2) that enable DR research. The paper also discusses evaluation challenges, open problems, and practical limitations in speed, realism, and integration with learning-based methods, arguing that combining inductive 3D priors with differentiable rendering holds promise for scalable, 3D-aware vision systems. Overall, DR is presented as a rapidly evolving field poised to reduce 3D data requirements while enabling robust 3D understanding from 2D observations, with real-time and embedded deployments on the horizon.

Abstract

Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the gradients of 3D objects to be calculated and propagated through images. It also reduces the requirement of 3D data collection and annotation, while enabling higher success rate in various applications. This paper reviews existing literature and discusses the current state of differentiable rendering, its applications and open research problems.

Differentiable Rendering: A Survey

TL;DR

This survey maps the landscape of differentiable rendering by categorizing methods according to underlying 3D representations (mesh, voxel, point cloud, implicit), detailing analytical and approximated gradient strategies, and highlighting global illumination approaches. It surveys applications from single-view object and human reconstruction to adversarial scenarios and data labeling, and reviews major libraries (TensorFlow Graphics, Kaolin, PyTorch3D, Mitsuba 2) that enable DR research. The paper also discusses evaluation challenges, open problems, and practical limitations in speed, realism, and integration with learning-based methods, arguing that combining inductive 3D priors with differentiable rendering holds promise for scalable, 3D-aware vision systems. Overall, DR is presented as a rapidly evolving field poised to reduce 3D data requirements while enabling robust 3D understanding from 2D observations, with real-time and embedded deployments on the horizon.

Abstract

Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the gradients of 3D objects to be calculated and propagated through images. It also reduces the requirement of 3D data collection and annotation, while enabling higher success rate in various applications. This paper reviews existing literature and discusses the current state of differentiable rendering, its applications and open research problems.

Paper Structure

This paper contains 36 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Schematic overview of differentiable rendering. Best viewed in color. The top part shows a basic optimization pipeline using a differentiable renderer where the gradients of an objective function with respect to the scene parameters and known ground-truth are calculated. The bottom part shows a common self-supervision pipeline based on differentiable rendering. Here, the supervision signal is provided in the form of image evidence and the neural network is updated by backpropagating the error between the image and the rendering output.
  • Figure 2: Several operations that are performed inside a rendering function, given a pixel, its corresponding triangle and material defined on vertices of the triangle, camera parameters, and light configurations. The green boxes represent inputs and the yellow boxes represent outputs. Best viewed in color.
  • Figure 3: An image of $10 \times 7$ pixels that shows a scene composed of three triangles. The vertex colors of one are white and its vertex positions are denoted by $v_i^w$. The vertex colors of the other two are black and their vertex positions are denoted by $v_i^b$.
  • Figure 4: Differentiable rendering algorithms differ in how the geometric information along a ray is collected and aggregated. For voxels, collecting geometric information is done by checking intersections of a ray and each voxel. For meshes, unlike non-differentiable rendering, multiple polygons have to be associated with a single ray. For point cloud, there are several ways to measure the influence of a point to a ray by pseudo-sizing. For neural implicit functions, various sampling techniques have been proposed for efficiency. Aggregation methods mainly depend on whether geometric information is treated deterministically or probabilistically, and how occlusion is handled.
  • Figure 5: Standard training pipeline of learning single-view 3D object reconstruction from 2D images. Dashed rectangles represent training data.
  • ...and 6 more figures