Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger
TL;DR
The paper tackles 3D reconstruction without 3D supervision by introducing Differentiable Volumetric Rendering (DVR), which learns implicit shape and texture fields with analytic depth gradients derived via implicit differentiation. DVR renders 2D images from implicit representations and optimizes with 2D supervision, supported by losses for RGB, depth, and occupancy, while maintaining a memory-efficient backward pass that does not store volumetric data. The approach supports both single-view and multi-view training and yields watertight meshes, rivaling fully supervised methods on benchmarks and showing strong performance on real-world data like the DTU dataset. This work broadens the applicability of implicit representations by enabling 2D-supervised learning and directly producing high-quality 3D outputs without discretized volume grids or template meshes.
Abstract
Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.
