Table of Contents
Fetching ...

Zero Grads: Learning Local Surrogate Losses for Non-Differentiable Graphics

Michael Fischer, Tobias Ritschel

TL;DR

This work tackles the challenge of performing gradient-based optimization on non-differentiable, black-box graphics pipelines by learning a local differentiable surrogate. The approach, ZeroGrads, smooths the forward objective with a Gaussian kernel, fits a local neural or polynomial surrogate h(θ, φ), and employs a low-variance, locality-aware estimator to update both the surrogate parameters φ and the decision variables θ online. Key contributions include a fully online, self-supervised surrogate learning framework, an efficient sampling strategy that reduces gradient variance, and demonstrations showing scalability to high-dimensional settings (up to tens of thousands of variables) across rendering, procedural modeling, and animation tasks. The method broadens the applicability of gradient-based optimization in graphics, offering a general, scalable toolkit that complements specialized differentiable renderers and derivative-free optimizers alike.

Abstract

Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a ``surrogate'' that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to other derivative-free algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.

Zero Grads: Learning Local Surrogate Losses for Non-Differentiable Graphics

TL;DR

This work tackles the challenge of performing gradient-based optimization on non-differentiable, black-box graphics pipelines by learning a local differentiable surrogate. The approach, ZeroGrads, smooths the forward objective with a Gaussian kernel, fits a local neural or polynomial surrogate h(θ, φ), and employs a low-variance, locality-aware estimator to update both the surrogate parameters φ and the decision variables θ online. Key contributions include a fully online, self-supervised surrogate learning framework, an efficient sampling strategy that reduces gradient variance, and demonstrations showing scalability to high-dimensional settings (up to tens of thousands of variables) across rendering, procedural modeling, and animation tasks. The method broadens the applicability of gradient-based optimization in graphics, offering a general, scalable toolkit that complements specialized differentiable renderers and derivative-free optimizers alike.

Abstract

Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a ``surrogate'' that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to other derivative-free algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.
Paper Structure (29 sections, 8 equations, 15 figures, 2 tables, 2 algorithms)

This paper contains 29 sections, 8 equations, 15 figures, 2 tables, 2 algorithms.

Figures (15)

  • Figure 1: Regular forward models $\mathcal{R}$ might not be able to provide gradients w.r.t. their input parameters $\theta$θ (red arrow, top). Our approach, ZeroGrads, provides this ability via a local learned surrogate $h$h (green arrows, bottom) that maps $\theta$θ to the associated loss and can be differentiated analytically.
  • Figure 2: A conceptual illustration of our approach; each subplot shows a one-dimensional cost landscape. For details please refer to Sec. \ref{['sec:Method']}: Overview.
  • Figure 3: A 1D-example with the function from Fig. \ref{['fig:Concept']}a) and our neural surrogate (blue), which learns a local approximation of the loss (black, MAE) and provides gradients for the optimization parameter (green). The sampling distribution is displayed in grey, state is shown at 0%, 10%, 35% and 100% total iterations.
  • Figure 4: Samples of the smooth objective (bottom row) on which we learn our surrogate: Perturbing the rigid scene parameters (top row) smooths discontinuities, e.g., the binary on/off for the LED task (an inset is shown).
  • Figure 5: We show an equal-sample comparison (i.e., the same budget of function evaluations) for the task of optimizing a $\TextOrMath{$256$\xspace}{256} \times \TextOrMath{$256$\xspace}{256} \times 3$ texture. CMA-ES cannot be run on this example due to its quadratic memory complexity causing out-of-memory errors on our 64 GB RAM machine.
  • ...and 10 more figures