UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation
Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei
TL;DR
UniVoxel tackles the inefficiency of inverse rendering by introducing a unified voxelization framework that explicitly represents scenes through a SDF geometry field and a semantic field for materials and illumination. Illumination is integrated via Spherical Gaussians, enabling joint learning of geometry, albedo, roughness, and lighting without costly multi-bounce ray tracing. The method achieves dramatic speedups (e.g., ~18 minutes per scene) and competitive or superior reconstruction and relighting quality on synthetic and real-world datasets, outperforming several state-of-the-art approaches. This explicit, voxel-based approach with hash-encoded scalability enhances practical applicability of inverse rendering in complex scenes.
Abstract
Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and illumination jointly, thereby accelerating the inverse rendering significantly. To be specific, we propose to encode a scene into a latent volumetric representation, based on which the geometry, materials and illumination can be readily learned via lightweight neural networks in a unified manner. Particularly, an essential design of UniVoxel is that we leverage local Spherical Gaussians to represent the incident light radiance, which enables the seamless integration of modeling illumination into the unified voxelization framework. Such novel design enables our UniVoxel to model the joint effects of direct lighting, indirect lighting and light visibility efficiently without expensive multi-bounce ray tracing. Extensive experiments on multiple benchmarks covering diverse scenes demonstrate that UniVoxel boosts the optimization efficiency significantly compared to other methods, reducing the per-scene training time from hours to 18 minutes, while achieving favorable reconstruction quality. Code is available at https://github.com/freemantom/UniVoxel.
