Table of Contents
Fetching ...

Locally Orderless Images for Optimization in Differentiable Rendering

Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi

TL;DR

The paper tackles gradient-sparsity challenges in differentiable rendering by introducing Locally Orderless Images (LOIs), a three-scale image representation built on inner scale $\sigma$, tonal scale $\beta$, and extent scale $\alpha$ to preserve local intensity distributions. It formulates an inverse rendering objective that matches rendered and reference histograms across scales using the Wasserstein distance, enabling robust optimization with standard RGB gradients. The approach is compatible with multiple differentiable renderers (vectorization, path tracing, rasterization) and complements variational optimization, yielding improved parameter recovery in challenging tasks such as shadows, caustics, and high-dimensional scene settings, including real data. LOIs offer a practical, scalable alternative to unreliable image pyramids, enhancing convergence and reliability in inverse rendering applications.

Abstract

Problems in differentiable rendering often involve optimizing scene parameters that cause motion in image space. The gradients for such parameters tend to be sparse, leading to poor convergence. While existing methods address this sparsity through proxy gradients such as topological derivatives or lagrangian derivatives, they make simplifying assumptions about rendering. Multi-resolution image pyramids offer an alternative approach but prove unreliable in practice. We introduce a method that uses locally orderless images, where each pixel maps to a histogram of intensities that preserves local variations in appearance. Using an inverse rendering objective that minimizes histogram distance, our method extends support for sparsely defined image gradients and recovers optimal parameters. We validate our method on various inverse problems using both synthetic and real data.

Locally Orderless Images for Optimization in Differentiable Rendering

TL;DR

The paper tackles gradient-sparsity challenges in differentiable rendering by introducing Locally Orderless Images (LOIs), a three-scale image representation built on inner scale , tonal scale , and extent scale to preserve local intensity distributions. It formulates an inverse rendering objective that matches rendered and reference histograms across scales using the Wasserstein distance, enabling robust optimization with standard RGB gradients. The approach is compatible with multiple differentiable renderers (vectorization, path tracing, rasterization) and complements variational optimization, yielding improved parameter recovery in challenging tasks such as shadows, caustics, and high-dimensional scene settings, including real data. LOIs offer a practical, scalable alternative to unreliable image pyramids, enhancing convergence and reliability in inverse rendering applications.

Abstract

Problems in differentiable rendering often involve optimizing scene parameters that cause motion in image space. The gradients for such parameters tend to be sparse, leading to poor convergence. While existing methods address this sparsity through proxy gradients such as topological derivatives or lagrangian derivatives, they make simplifying assumptions about rendering. Multi-resolution image pyramids offer an alternative approach but prove unreliable in practice. We introduce a method that uses locally orderless images, where each pixel maps to a histogram of intensities that preserves local variations in appearance. Using an inverse rendering objective that minimizes histogram distance, our method extends support for sparsely defined image gradients and recovers optimal parameters. We validate our method on various inverse problems using both synthetic and real data.

Paper Structure

This paper contains 24 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Image gradients are sparse with respect to optimization parameters that induce motion in the image space ⑥. We show an inverse problem with the goal of recovering the position ($\theta$) of a distant light source from a synthetic image of a shiny ball, i.e.① to ⑤. Existing methods compute proxy gradients such as: Lagrangian derivatives xingDifferentiableRenderingUsing2022, which track only primary-ray intersections, or variational derivatives fischer2023plateau, which can be prone to local minima. Our method uses standard RGB gradients and uses an inverse rendering objective that matches locally orderless images.
  • Figure 2: Inner and Extent scale spaces. The image representation $\mathcal{P}(\mathbf{x},k,\alpha,\beta,\sigma)$, is composed of three distinct histogram-valued scale spaces. The $\sigma$-space (top) controls the effective resolution of the image and the $\alpha$-space (bottom) defines the spatial extent of histogram integration. The rendered images shown here are intensities sampled as $\mathcal{I}(\mathbf{x})=k, k\sim\mathcal{P}(\mathbf{x}, k, \theta, \alpha,\beta,\sigma)$ for given kernel parameters $\alpha$ and $\sigma$, and bin width $\beta$. We recover inverse rendering parameters $\theta$ by matching these locally orderless structures for rendered and reference images.
  • Figure 3: Scale-space matching extends gradient support. Given an image (a) of a disk we recover its position $\theta$ on the horizontal axis. At stationary resolution ($\sigma=0$), the initial and target (dotted) disks do not overlap, as shown in the corresponding 1D signals in (b). The image gradient $\frac{\partial\mathcal{I}}{\partial\theta}$ is sparse (orange) and is non-zero only at the boundaries of the disk (c-top). The error gradient $\frac{\partial\mathcal{E}}{\partial\theta}$ is zero everywhere (green) and the optimization is stuck in a local minimum. When matching at coarser scales (d), the gradients are no longer sparse (c-bottom), leading to optimal recovery.
  • Figure 4: Tonal Separation. Shown are two (a-top and a-bottom) 1D inverse problems where we recover disk positions ($\theta$) from images (left). Image matching within $\sigma$-space measures only the errors in the mean of the intensity distributions at each scale. In inverse settings that involve multiple objects with different appearances, this approach is likely to get stuck in a local minimum (a-center-left). The $\alpha$-space integration kernels are intensity-aware and treat images as sets of distinct equal-intensity isophotes (b). When images are matched in all three scale spaces, the optimization is less prone to getting stuck in local minima (a-center-right).
  • Figure 5: Histogram matching is less sensitive to noise. To recover the position ($\theta$) of a circular disk from a noisy reference image (a-bottom-right), methods that match images only at their stationary resolution or in $\sigma$-scale space fail --- as they overlook imprecision and uncertainty in radiance measurements. Our method uses a tonal parameter ($\beta$) to account for intensity uncertainty and an extent scale-space to preserve the distribution modes at coarser scales (b), leading to optimal recovery of $\theta$.
  • ...and 6 more figures