Table of Contents
Fetching ...

Robust Inverse Graphics via Probabilistic Inference

Tuan Anh Le, Pavel Sountsov, Matthew D. Hoffman, Ben Lee, Brian Patton, Rif A. Saurous

TL;DR

Monocular 3D scene inference under unknown corruptions is tackled by Robust Inverse Graphics (RIG), which treats the scene latent $x$ and the corruption latent $c$ within a joint posterior $p(x,c|y)$ using NeRF priors. For diffusion-based priors, the framework uses ReGAL to condition the latent diffusion on the corruption, illustrating that MAP can fail due to billboard explanations while posterior sampling yields faithful decorruptions. The work provides a general Bayesian framework, diffusion conditioning with auxiliary latents (IS/SMC variants), and extensive experiments on ShapeNet and MultiShapeNet showing improvements over MAP and standard depth predictors under rain, fog, and FOV perturbations. It demonstrates how scene priors can be leveraged beyond generation tasks, while noting limitations such as the need for 3D priors and computational cost, and pointing to future work in faster, amortized inference.

Abstract

How do we infer a 3D scene from a single image in the presence of corruptions like rain, snow or fog? Straightforward domain randomization relies on knowing the family of corruptions ahead of time. Here, we propose a Bayesian approach-dubbed robust inverse graphics (RIG)-that relies on a strong scene prior and an uninformative uniform corruption prior, making it applicable to a wide range of corruptions. Given a single image, RIG performs posterior inference jointly over the scene and the corruption. We demonstrate this idea by training a neural radiance field (NeRF) scene prior and using a secondary NeRF to represent the corruptions over which we place an uninformative prior. RIG, trained only on clean data, outperforms depth estimators and alternative NeRF approaches that perform point estimation instead of full inference. The results hold for a number of scene prior architectures based on normalizing flows and diffusion models. For the latter, we develop reconstruction-guidance with auxiliary latents (ReGAL)-a diffusion conditioning algorithm that is applicable in the presence of auxiliary latent variables such as the corruption. RIG demonstrates how scene priors can be used beyond generation tasks.

Robust Inverse Graphics via Probabilistic Inference

TL;DR

Monocular 3D scene inference under unknown corruptions is tackled by Robust Inverse Graphics (RIG), which treats the scene latent and the corruption latent within a joint posterior using NeRF priors. For diffusion-based priors, the framework uses ReGAL to condition the latent diffusion on the corruption, illustrating that MAP can fail due to billboard explanations while posterior sampling yields faithful decorruptions. The work provides a general Bayesian framework, diffusion conditioning with auxiliary latents (IS/SMC variants), and extensive experiments on ShapeNet and MultiShapeNet showing improvements over MAP and standard depth predictors under rain, fog, and FOV perturbations. It demonstrates how scene priors can be leveraged beyond generation tasks, while noting limitations such as the need for 3D priors and computational cost, and pointing to future work in faster, amortized inference.

Abstract

How do we infer a 3D scene from a single image in the presence of corruptions like rain, snow or fog? Straightforward domain randomization relies on knowing the family of corruptions ahead of time. Here, we propose a Bayesian approach-dubbed robust inverse graphics (RIG)-that relies on a strong scene prior and an uninformative uniform corruption prior, making it applicable to a wide range of corruptions. Given a single image, RIG performs posterior inference jointly over the scene and the corruption. We demonstrate this idea by training a neural radiance field (NeRF) scene prior and using a secondary NeRF to represent the corruptions over which we place an uninformative prior. RIG, trained only on clean data, outperforms depth estimators and alternative NeRF approaches that perform point estimation instead of full inference. The results hold for a number of scene prior architectures based on normalizing flows and diffusion models. For the latter, we develop reconstruction-guidance with auxiliary latents (ReGAL)-a diffusion conditioning algorithm that is applicable in the presence of auxiliary latent variables such as the corruption. RIG demonstrates how scene priors can be used beyond generation tasks.
Paper Structure (18 sections, 1 theorem, 19 equations, 15 figures, 2 tables, 3 algorithms)

This paper contains 18 sections, 1 theorem, 19 equations, 15 figures, 2 tables, 3 algorithms.

Key Result

Proposition 2.1

Assume $p(c) \propto 1$, for any $x$ there exists a $c$ such that $R(x, c) = y$, and $p(y\mid x, c)$ is maximized if and only if $R(x, c) = y$. Then the set of MAP solutions is that is, $x$ is the maximum a-priori scene that renders exactly to $y$, either because $c$ covers it completely or because the uncovered parts happen to render to $y$.

Figures (15)

  • Figure 1: Robust Inverse Graphics (RIG). By modeling the generative process of 2D renderings $y$ of 3D scenes, we can reconstruct clean scenes by performing joint probabilistic inference on scene latents ($x$) and corruption parameters ($c$).
  • Figure 2: Example corruptions. FOV refers to the field-of-view intrinsic parameter of the pinhole camera parameterization.
  • Figure 3: A toy model of full posterior inference avoiding "billboard" solutions of MAP. See main text for details.
  • Figure 4: Example decorruptions for the cloud corruption. The Clean columns are conditioned on the Clean RGB data, while the rest are conditioned on the Corrupted RGB data. For ShapeNet, the black outlines on the depth images are the predicted masks, except in the case of DPT where ground truth mask is used. For MultiShapeNet, ground truth mask is used for all.
  • Figure 5: MAP solution uses the corruption NeRF to explain the observation more than the VI solution.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Proposition 2.1
  • proof