IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Zhibing Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin
TL;DR
This work targets intrinsic decomposition from an arbitrary number of views under unconstrained illumination by introducing IDArb, a diffusion-based model built on a cross-view and cross-component attention framework. It leverages the ARB-Objaverse dataset (5.7M synthetic images across 68k models) and an illumination-augmented, view-adapted training strategy to achieve multi-view-consistent estimates of albedo $\mathbf{A}$, normal $\mathbf{N}$, metallic $\mathbf{M}$, and roughness $\mathbf{R}$, without requiring camera poses. IDArb achieves state-of-the-art performance on synthetic and real data, enabling downstream tasks such as single-image relighting, material editing, and photometric stereo, while also providing priors to improve optimization-based inverse rendering. The approach demonstrates practical potential for realistic 3D content creation and highlights avenues for handling diverse lighting and view configurations in inverse rendering.
Abstract
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other hand, learning-based approaches leverage rich material priors from existing 3D object datasets but face challenges with maintaining multi-view consistency. In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. Our method achieves accurate and multi-view consistent estimation on surface normals and material properties. This is made possible through a novel cross-view, cross-domain attention module and an illumination-augmented, view-adaptive training strategy. Additionally, we introduce ARB-Objaverse, a new dataset that provides large-scale multi-view intrinsic data and renderings under diverse lighting conditions, supporting robust training. Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.
