ARM: Appearance Reconstruction Model for Relightable 3D Generation
Xiang Feng, Chang Yu, Zoubin Bi, Yintong Shang, Feng Gao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
TL;DR
ARM presents a two-stage framework for relightable 3D generation from sparse views by decoupling geometry from appearance and performing texture synthesis in UV space. It introduces GeoRM for geometry and separate GlossyRM/InstantAlbedo components for appearance, aided by a material prior to robustly decompose lighting and materials. By back-projecting multi-view measurements into a UV atlas and employing a global-receptive-field UV module, ARM achieves richly detailed textures and realistic relighting, surpassing prior image-to-3D methods. Trained on 8 H100 GPUs, ARM demonstrates strong quantitative and qualitative gains across geometry and appearance tasks, including multi-material and relightable scenarios, highlighting its practical impact for games, metaverse assets, and e-commerce.
Abstract
Recent image-to-3D reconstruction models have greatly advanced geometry generation, but they still struggle to faithfully generate realistic appearance. To address this, we introduce ARM, a novel method that reconstructs high-quality 3D meshes and realistic appearance from sparse-view images. The core of ARM lies in decoupling geometry from appearance, processing appearance within the UV texture space. Unlike previous methods, ARM improves texture quality by explicitly back-projecting measurements onto the texture map and processing them in a UV space module with a global receptive field. To resolve ambiguities between material and illumination in input images, ARM introduces a material prior that encodes semantic appearance information, enhancing the robustness of appearance decomposition. Trained on just 8 H100 GPUs, ARM outperforms existing methods both quantitatively and qualitatively.
