Seeing a Rose in Five Thousand Ways
Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu
TL;DR
This work tackles the problem of learning object intrinsics—3D geometry, texture, and material properties—from a single image containing multiple instances of the same object type. It introduces a generative framework that represents intrinsics with neural fields (e.g., a 3D shape via a Signed Distance Function $f_\theta$, albedo via $a_\psi$, and a shininess parameter $\alpha$) conditioned on a latent code and rendered under environment extrinsics using a Phong-like lighting model and neural volume rendering. Training uses an adversarial setup on image crops with pose-aware regularization and scale/translation augmentations to enforce 3D-consistency and robustness in the limited-data regime. The model demonstrates recovery of object intrinsics from in-the-wild and synthetic images, enabling novel-view synthesis, relighting, and generation that surpasses baselines like GNeRF, Neural-PIL, and NeRD on multiple metrics, with notable gains in depth, albedo, and image realism. This approach offers a practical pathway to 3D-aware generation from minimal data, with applications in shape reconstruction, relighting, and controllable instance generation in real-world scenes.
Abstract
What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.
