DreamTexture: Shape from Virtual Texture with Analysis by Augmentation
Ananta R. Bhattarai, Xingzhe He, Alla Sheffer, Helge Rhodin
TL;DR
DreamTexture introduces a monocular, single-view 3D reconstruction framework built on Analysis by Augmentation, leveraging a virtual texture augmented to an input image and guided by a diffusion-prior loss. The method splits the problem into texture-coordinate optimization (Stage I) and depth optimization via a Least Squares Conformal Mapping energy (Stage II), enabling a memory-efficient depth map without volumetric representations. By reframing texture distortions induced by the diffusion prior as cues for depth through LSCM, DreamTexture demonstrates competitive or superior performance to multi-view baselines on synthetic and in-the-wild images, with clear advantages in efficiency and generalization to textureless or irregularly textured objects. The AbA paradigm broadens the use of pre-trained generative models for unsupervised 3D reconstruction, suggesting scalable applications such as logo augmentation and video-based temporal reconstruction.
Abstract
DreamFusion established a new paradigm for unsupervised 3D reconstruction from virtual views by combining advances in generative models and differentiable rendering. However, the underlying multi-view rendering, along with supervision from large-scale generative models, is computationally expensive and under-constrained. We propose DreamTexture, a novel Shape-from-Virtual-Texture approach that leverages monocular depth cues to reconstruct 3D objects. Our method textures an input image by aligning a virtual texture with the real depth cues in the input, exploiting the inherent understanding of monocular geometry encoded in modern diffusion models. We then reconstruct depth from the virtual texture deformation with a new conformal map optimization, which alleviates memory-intensive volumetric representations. Our experiments reveal that generative models possess an understanding of monocular shape cues, which can be extracted by augmenting and aligning texture cues -- a novel monocular reconstruction paradigm that we call Analysis by Augmentation.
