Table of Contents
Fetching ...

UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei

TL;DR

UniVoxel tackles the inefficiency of inverse rendering by introducing a unified voxelization framework that explicitly represents scenes through a SDF geometry field and a semantic field for materials and illumination. Illumination is integrated via Spherical Gaussians, enabling joint learning of geometry, albedo, roughness, and lighting without costly multi-bounce ray tracing. The method achieves dramatic speedups (e.g., ~18 minutes per scene) and competitive or superior reconstruction and relighting quality on synthetic and real-world datasets, outperforming several state-of-the-art approaches. This explicit, voxel-based approach with hash-encoded scalability enhances practical applicability of inverse rendering in complex scenes.

Abstract

Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and illumination jointly, thereby accelerating the inverse rendering significantly. To be specific, we propose to encode a scene into a latent volumetric representation, based on which the geometry, materials and illumination can be readily learned via lightweight neural networks in a unified manner. Particularly, an essential design of UniVoxel is that we leverage local Spherical Gaussians to represent the incident light radiance, which enables the seamless integration of modeling illumination into the unified voxelization framework. Such novel design enables our UniVoxel to model the joint effects of direct lighting, indirect lighting and light visibility efficiently without expensive multi-bounce ray tracing. Extensive experiments on multiple benchmarks covering diverse scenes demonstrate that UniVoxel boosts the optimization efficiency significantly compared to other methods, reducing the per-scene training time from hours to 18 minutes, while achieving favorable reconstruction quality. Code is available at https://github.com/freemantom/UniVoxel.

UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

TL;DR

UniVoxel tackles the inefficiency of inverse rendering by introducing a unified voxelization framework that explicitly represents scenes through a SDF geometry field and a semantic field for materials and illumination. Illumination is integrated via Spherical Gaussians, enabling joint learning of geometry, albedo, roughness, and lighting without costly multi-bounce ray tracing. The method achieves dramatic speedups (e.g., ~18 minutes per scene) and competitive or superior reconstruction and relighting quality on synthetic and real-world datasets, outperforming several state-of-the-art approaches. This explicit, voxel-based approach with hash-encoded scalability enhances practical applicability of inverse rendering in complex scenes.

Abstract

Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and illumination jointly, thereby accelerating the inverse rendering significantly. To be specific, we propose to encode a scene into a latent volumetric representation, based on which the geometry, materials and illumination can be readily learned via lightweight neural networks in a unified manner. Particularly, an essential design of UniVoxel is that we leverage local Spherical Gaussians to represent the incident light radiance, which enables the seamless integration of modeling illumination into the unified voxelization framework. Such novel design enables our UniVoxel to model the joint effects of direct lighting, indirect lighting and light visibility efficiently without expensive multi-bounce ray tracing. Extensive experiments on multiple benchmarks covering diverse scenes demonstrate that UniVoxel boosts the optimization efficiency significantly compared to other methods, reducing the per-scene training time from hours to 18 minutes, while achieving favorable reconstruction quality. Code is available at https://github.com/freemantom/UniVoxel.
Paper Structure (29 sections, 17 equations, 18 figures, 5 tables)

This paper contains 29 sections, 17 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Overview of the proposed UniVoxel. Typical methods chen2022tracingzhang2021nerfactorzhang2022modeling for inverse rendering learn implicit neural scene representations from spatial field by modeling the geometry, materials and illumination individually employing deep MLP networks. In contrast, our UniVoxel learns explicit scene representations by performing voxelization towards two essential scene elements: SDF field and semantic field, based on which the geometry, materials and illumination can be learned with lightweight networks in a unified manner, boosting the optimization efficiency of inverse rendering substantially.
  • Figure 1: Visualization of the reconstructed albedo maps by different illumination models.
  • Figure 2: Overall framework of the proposed UniVoxel. It performs voxelization towards the SDF field and semantic field to obtain explicit scene representations. The learned volumetric SDF field focuses on capturing the scene geometry while the semantic field characterizes the materials and illumination for the scene. As a result, our UniVoxel is able to learn the materials (including the albedo and roughness) and illumination using lightweight MLP networks based on the voxelization of the semantic field. Meanwhile, the surface normal and opacity for an arbitrary 3D point can be easily derived from the voxelization of the SDF field. Hence, our model is able to learn all these scene properties efficiently in a unified manner. In particular, we leverage Spherical Gaussians (SG) to model the incident light field, which allows for unified learning of the illumination with other scene properties based on the voxelization of the scene representation.
  • Figure 2: Visualization of the albedo maps reconstructed by our method with/without the regularization for Spherical Gaussians.
  • Figure 3: Qualitative comparisons on 2 scenes from the MII synthetic dataset. More qualitative results are shown in the appendix.
  • ...and 13 more figures