Table of Contents
Fetching ...

Modular Primitives for High-Performance Differentiable Rendering

Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, Timo Aila

TL;DR

The paper presents a high-performance, differentiable renderer built around four modular primitives that exploit deferred shading and hardware rasterization to produce accurate gradients at megapixel resolutions. By integrating with automatic differentiation frameworks via tensor-based inputs and OpenGL-CUDA interop, it achieves fast forward and backward passes, while analytic antialiasing and mipmapped texture filtering provide robust gradient signals for texture and geometry optimization. It demonstrates significant speedups over prior differentiable renderers and validates the approach on facial performance capture, achieving strong geometric correspondence in multi-view, high-resolution data. The work offers a practical, extensible framework for inverse rendering and generative modeling that can be extended to richer appearance models and global illumination in future work.

Abstract

We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing, all in high resolutions. Our modular primitives allow custom, high-performance graphics pipelines to be built directly within automatic differentiation frameworks such as PyTorch or TensorFlow. As a motivating application, we formulate facial performance capture as an inverse rendering problem and show that it can be solved efficiently using our tools. Our results indicate that this simple and straightforward approach achieves excellent geometric correspondence between rendered results and reference imagery.

Modular Primitives for High-Performance Differentiable Rendering

TL;DR

The paper presents a high-performance, differentiable renderer built around four modular primitives that exploit deferred shading and hardware rasterization to produce accurate gradients at megapixel resolutions. By integrating with automatic differentiation frameworks via tensor-based inputs and OpenGL-CUDA interop, it achieves fast forward and backward passes, while analytic antialiasing and mipmapped texture filtering provide robust gradient signals for texture and geometry optimization. It demonstrates significant speedups over prior differentiable renderers and validates the approach on facial performance capture, achieving strong geometric correspondence in multi-view, high-resolution data. The work offers a practical, extensible framework for inverse rendering and generative modeling that can be extended to richer appearance models and global illumination in future work.

Abstract

We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing, all in high resolutions. Our modular primitives allow custom, high-performance graphics pipelines to be built directly within automatic differentiation frameworks such as PyTorch or TensorFlow. As a motivating application, we formulate facial performance capture as an inverse rendering problem and show that it can be solved efficiently using our tools. Our results indicate that this simple and straightforward approach achieves excellent geometric correspondence between rendered results and reference imagery.

Paper Structure

This paper contains 30 sections, 5 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: A simple differentiable rendering pipeline with our proposed primitive operations highlighted in red. The input data for rendering (blue) may be generated by, e.g., a neural network if the pipeline is part of a larger computation graph. In simpler setups the geometry processing might include only the model/view/perspective transformations for vertex positions with other inputs being constants or learnable parameters. All intermediate buffers (green) are in image space. Connections with gradients are denoted by a white triangle. Channel counts are fixed only for vertex positions and indices, and in the intermediate buffers produced by the rasterization operation. There are no restrictions on the channel counts for vertex attributes, textures, related intermediate data, or the output image.
  • Figure 2: Filtered differentiable texture lookup with a non-constant texture. (a) In the beginning of forward pass, prefiltered MIP levels $c_\mathit{lod}$ are constructed from the full-resolution texture $c[i,j]$ by repeated downsampling using a $2\times2$ box filter. (b) In forward pass, each lookup $g=f(s,t,\mathit{lod})$ interpolates prefiltered values on the appropriate MIP level as determined by the size of sample footprint. In backward pass, we receive incoming gradients $\partial{L}/\partial{g}$. Texture coordinate gradients $\partial{L}/\partial{s}$ and $\partial{L}/\partial{t}$ for each lookup are computed based on these and contents of texels that were used in interpolation. Simultaneously, texture image gradients $\partial{L}/\partial c_\mathit{lod}[i,j]$ are accumulated into each MIP level. In a trilinear lookup, these calculations are performed on two adjacent levels and weighted according to the fractional part of $\mathit{lod}$. (c) To produce outgoing full-resolution texture image gradients $\partial{L}/\partial{c}[i,j]$, we sum the accumulated gradients from all MIP levels.
  • Figure 3: Illustration of our analytic antialiasing method. A vertical silhouette edge $p,q$ passes between centers of horizontally adjacent pixels $A$ and $B$. This is detected by the pixels having a different triangle ID rasterized into them. Pixel pair $A,B$ is processed together, and one of the following cases may occur. (a) The edge crosses the segment connecting pixel centers inside pixel $B$, causing color of $A$ to blend into $B$. (b) The crossing happens inside pixel $A$, so blending is done in the opposite direction. To approximate the geometric coverage between surfaces, the blending factor is a linear function of the location of the crossing point --- from zero at midpoint to 50% at pixel center. This antialiasing method is differentiable because the resulting pixel colors are continuous functions of positions of $p$ and $q$.
  • Figure 4: To validate that our visibility gradients provide useful information even for small triangles, we infer vertex positions and colors of a simple mesh in extremely small resolutions. The geometry of the current solution is superimposed on the rasterized images for illustration purposes only. Rightmost column shows the final, optimized mesh rendered in high resolution. In 4 $\times$ 4 resolution, the average triangle area is only 0.54 pixels. The optimization nonetheless converges to the correct solution, albeit slower than in higher resolutions. In 2 $\times$ 2 resolution the optimization fails to converge.
  • Figure 5: Convergence of the cube shape and color optimization test (average of 10 successful optimizations). Vertical axis shows the average distance between vertices and their true positions in the unit cube. The solid curves indicate convergence in the continuous coloring mode (Figure \ref{['fig:cube']}), and the dashed curves correspond to the discontinuous coloring mode. As expected, the latter is somewhat more difficult to optimize. Note the logarithmic horizontal axis.
  • ...and 7 more figures