Table of Contents
Fetching ...

VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

Angtian Wang, Peng Wang, Jian Sun, Adam Kortylewski, Alan Yuille

TL;DR

VoGE tackles differentiable rendering for explicit geometric representations by using Gaussian ellipsoids as 3D primitives and rendering via ray-traced volume densities. It introduces an efficient approximate closed-form for density aggregation and a coarse-to-fine rendering pipeline, yielding real-time performance with a CUDA implementation. Key contributions include the Gaussian-ellipsoid reconstruction kernel, a differentiable rendering pipeline that naturally handles occlusions, and integration as a neural network module for sampling and rendering. Empirically, VoGE outperforms state-of-the-art differentiable renderers on in-the-wild pose estimation and texture-related tasks, while maintaining competitive rendering speed and providing improved gradient signals for occlusion reasoning and inverse rendering.

Abstract

The Gaussian reconstruction kernels have been proposed by Westover (1990) and studied by the computer graphics community back in the 90s, which gives an alternative representation of object 3D geometry from meshes and point clouds. On the other hand, current state-of-the-art (SoTA) differentiable renderers, Liu et al. (2019), use rasterization to collect triangles or points on each image pixel and blend them based on the viewing distance. In this paper, we propose VoGE, which utilizes the volumetric Gaussian reconstruction kernels as geometric primitives. The VoGE rendering pipeline uses ray tracing to capture the nearest primitives and blends them as mixtures based on their volume density distributions along the rays. To efficiently render via VoGE, we propose an approximate closeform solution for the volume density aggregation and a coarse-to-fine rendering strategy. Finally, we provide a CUDA implementation of VoGE, which enables real-time level rendering with a competitive rendering speed in comparison to PyTorch3D. Quantitative and qualitative experiment results show VoGE outperforms SoTA counterparts when applied to various vision tasks, e.g., object pose estimation, shape/texture fitting, and occlusion reasoning. The VoGE library and demos are available at: https://github.com/Angtian/VoGE.

VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

TL;DR

VoGE tackles differentiable rendering for explicit geometric representations by using Gaussian ellipsoids as 3D primitives and rendering via ray-traced volume densities. It introduces an efficient approximate closed-form for density aggregation and a coarse-to-fine rendering pipeline, yielding real-time performance with a CUDA implementation. Key contributions include the Gaussian-ellipsoid reconstruction kernel, a differentiable rendering pipeline that naturally handles occlusions, and integration as a neural network module for sampling and rendering. Empirically, VoGE outperforms state-of-the-art differentiable renderers on in-the-wild pose estimation and texture-related tasks, while maintaining competitive rendering speed and providing improved gradient signals for occlusion reasoning and inverse rendering.

Abstract

The Gaussian reconstruction kernels have been proposed by Westover (1990) and studied by the computer graphics community back in the 90s, which gives an alternative representation of object 3D geometry from meshes and point clouds. On the other hand, current state-of-the-art (SoTA) differentiable renderers, Liu et al. (2019), use rasterization to collect triangles or points on each image pixel and blend them based on the viewing distance. In this paper, we propose VoGE, which utilizes the volumetric Gaussian reconstruction kernels as geometric primitives. The VoGE rendering pipeline uses ray tracing to capture the nearest primitives and blends them as mixtures based on their volume density distributions along the rays. To efficiently render via VoGE, we propose an approximate closeform solution for the volume density aggregation and a coarse-to-fine rendering strategy. Finally, we provide a CUDA implementation of VoGE, which enables real-time level rendering with a competitive rendering speed in comparison to PyTorch3D. Quantitative and qualitative experiment results show VoGE outperforms SoTA counterparts when applied to various vision tasks, e.g., object pose estimation, shape/texture fitting, and occlusion reasoning. The VoGE library and demos are available at: https://github.com/Angtian/VoGE.
Paper Structure (29 sections, 30 equations, 21 figures, 5 tables)

This paper contains 29 sections, 30 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: VoGE conducts ray tracing volume densities. Given the Gaussian Ellipsoids, i.e. a set of ellipsoidal 3D Gaussian reconstruction kernels, VoGE first samples rays $r(t)$. And along each ray, VoGE traces the density distribution of each ellipsoid $\rho_k(r(t))$ respectively. Then occupancy $T(r(t))$ is accumulated via density aggregation along the ray. The observation of each Gaussian ellipsoid kernels $W_k$ is computed via integral of reweighted per-kernel volume density $W_k(r(t))$. Finally, VoGE synthesizes the image using the computed $W_k$ on each pixel to interpolate per kernel attributes. In practice, the density aggregation is bootstrapped via approximate close-form solutions.
  • Figure 2: Rendering with increasing numbers of Gaussian Ellipsoids. Top: the kernel-to-pixel weight along the median row on the image, the colors demonstrate each corresponded Gaussian ellipsoids. Bottom: the rendered RGB image. Note VoGE resolves occlusion naturally in a contiguous way.
  • Figure 3: Computing gradient of $\mathbf{M}$ when rendering two ellipsoids. The colored numbers below indicate the $\mathbf{M}$ of each ellipsoids. The red arrow and $G_x, G_y$ show the $\frac{\partial (\mathbf{I} - \mathbf{\hat{I}})^2}{\partial \mathbf{M_{red}}}$.
  • Figure 4: The forward process for VoGE rendering. The camera is described with the extrinsic matrix $\mathbf{E}$ composed with $\mathbf{R}$ and $\mathbf{T}$, as well as the intrinsic matrix $\mathbf{I}$ composed with $F$ and $O_x, O_y$. Given Gaussian Ellipsoids, VoGE renderer synthesizes an image $\mathbf{O}$.
  • Figure 5: Comparison for rendering speeds of VoGE and PyTorch3D, reported in images per second (higher better). We evaluate the rendering speed using cuboids with different number of primitives (vertices, ellipsoids), which illustrated using different colors, also different image sizes and number of primitives per pixel.
  • ...and 16 more figures