VideoNeuMat: Neural Material Extraction from Generative Video Models
Bowen Xue, Saeed Hadadan, Zheng Zeng, Fabrice Rousselle, Zahra Montazeri, Milos Hasan
TL;DR
VideoNeuMat addresses the data bottleneck in photorealistic material authoring by learning from internet-scale video diffusion models. It finetunes a large video model to act as a virtual gonioreflectometer and then uses a Large Reconstruction Model to map short material videos to NeuMIP-based neural materials, enabling relighting on novel shapes and views. The two-stage approach yields materials with superior realism and diversity compared to limited synthetic data and prior diffusion-based methods, effectively transferring knowledge from video priors to standalone 3D assets. This provides a practical, data-efficient pathway for producing reusable neural materials for photorealistic rendering in diverse scenes and lighting conditions.
Abstract
Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently limited by the lack of high-quality training data. While recent video generative models effortlessly produce realistic material appearances, this knowledge remains entangled with geometry and lighting. We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models. First, we finetune a large video model (Wan 2.1 14B) to generate material sample videos under controlled camera and lighting trajectories, effectively creating a "virtual gonioreflectometer" that preserves the model's material realism while learning a structured measurement pattern. Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone. From 17 generated video frames, our LRM performs single-pass inference to predict neural material parameters that generalize to novel viewing and lighting conditions. The resulting materials exhibit realism and diversity far exceeding the limited synthetic training data, demonstrating that material knowledge can be successfully transferred from internet-scale video models into standalone, reusable neural 3D assets.
