Table of Contents
Fetching ...

DreamTexture: Shape from Virtual Texture with Analysis by Augmentation

Ananta R. Bhattarai, Xingzhe He, Alla Sheffer, Helge Rhodin

TL;DR

DreamTexture introduces a monocular, single-view 3D reconstruction framework built on Analysis by Augmentation, leveraging a virtual texture augmented to an input image and guided by a diffusion-prior loss. The method splits the problem into texture-coordinate optimization (Stage I) and depth optimization via a Least Squares Conformal Mapping energy (Stage II), enabling a memory-efficient depth map without volumetric representations. By reframing texture distortions induced by the diffusion prior as cues for depth through LSCM, DreamTexture demonstrates competitive or superior performance to multi-view baselines on synthetic and in-the-wild images, with clear advantages in efficiency and generalization to textureless or irregularly textured objects. The AbA paradigm broadens the use of pre-trained generative models for unsupervised 3D reconstruction, suggesting scalable applications such as logo augmentation and video-based temporal reconstruction.

Abstract

DreamFusion established a new paradigm for unsupervised 3D reconstruction from virtual views by combining advances in generative models and differentiable rendering. However, the underlying multi-view rendering, along with supervision from large-scale generative models, is computationally expensive and under-constrained. We propose DreamTexture, a novel Shape-from-Virtual-Texture approach that leverages monocular depth cues to reconstruct 3D objects. Our method textures an input image by aligning a virtual texture with the real depth cues in the input, exploiting the inherent understanding of monocular geometry encoded in modern diffusion models. We then reconstruct depth from the virtual texture deformation with a new conformal map optimization, which alleviates memory-intensive volumetric representations. Our experiments reveal that generative models possess an understanding of monocular shape cues, which can be extracted by augmenting and aligning texture cues -- a novel monocular reconstruction paradigm that we call Analysis by Augmentation.

DreamTexture: Shape from Virtual Texture with Analysis by Augmentation

TL;DR

DreamTexture introduces a monocular, single-view 3D reconstruction framework built on Analysis by Augmentation, leveraging a virtual texture augmented to an input image and guided by a diffusion-prior loss. The method splits the problem into texture-coordinate optimization (Stage I) and depth optimization via a Least Squares Conformal Mapping energy (Stage II), enabling a memory-efficient depth map without volumetric representations. By reframing texture distortions induced by the diffusion prior as cues for depth through LSCM, DreamTexture demonstrates competitive or superior performance to multi-view baselines on synthetic and in-the-wild images, with clear advantages in efficiency and generalization to textureless or irregularly textured objects. The AbA paradigm broadens the use of pre-trained generative models for unsupervised 3D reconstruction, suggesting scalable applications such as logo augmentation and video-based temporal reconstruction.

Abstract

DreamFusion established a new paradigm for unsupervised 3D reconstruction from virtual views by combining advances in generative models and differentiable rendering. However, the underlying multi-view rendering, along with supervision from large-scale generative models, is computationally expensive and under-constrained. We propose DreamTexture, a novel Shape-from-Virtual-Texture approach that leverages monocular depth cues to reconstruct 3D objects. Our method textures an input image by aligning a virtual texture with the real depth cues in the input, exploiting the inherent understanding of monocular geometry encoded in modern diffusion models. We then reconstruct depth from the virtual texture deformation with a new conformal map optimization, which alleviates memory-intensive volumetric representations. Our experiments reveal that generative models possess an understanding of monocular shape cues, which can be extracted by augmenting and aligning texture cues -- a novel monocular reconstruction paradigm that we call Analysis by Augmentation.

Paper Structure

This paper contains 35 sections, 10 equations, 17 figures, 19 tables.

Figures (17)

  • Figure 1: Teaser. While existing analysis-by-synthesis approaches (left) rely on volumetric rendering of multiple real or virtual views, DreamTexture (right) takes one image as input, augments a virtual texture, and optimizes the output depth map by aligning the real and virtual shape cues with a pre-trained image prior via Score Distillation Sampling (SDS). This analysis by augmentation (AbA) approach reconstructs depth without any 3D supervision and unlike classical shape from texture, it applies to both textured and textureless objects.
  • Figure 2: Analysis by Augmentation. The virtual texture is augmented and incrementally aligned with the perspective cues in the image (left to right: initialization, 1k iterations, 11k iterations).
  • Figure 3: DreamTexture overview. In Stage I, we use the SDS loss to faithfully augment the texture on the input image, producing texture coordinates that align the virtual texture with depth cues in the input image. In Stage II, we apply shape from virtual texture by viewing the depth map as the $z$-coordinates of a 3D mesh and optimize it using the LSCM energy, minimizing angular texture distortion.
  • Figure 4: Text-to-3D comparison. Compared to RealFusion (RF) and DreamFusion (DF), our method achieves more accurate reconstructions, particularly for objects that lack prominent visual features. We use the same text prompt as DreamFusion to generate input images with Stable Diffusion. To render novel views in our approach, we resample the input image using the inverse texture coordinates and render using PyTorch3D with orthographic projection (elevation $=0^{\circ}$) and azimuth angles of $0^{\circ}$ (view 1), $-10^{\circ}$ (view 2), and $+10^{\circ}$ (view 3).
  • Figure 5: Qualitative evaluation on PrimitiveShapesX. Our method reconstructs depth maps with high fidelity, producing accurate surface normals for smooth and sharp-edged objects.
  • ...and 12 more figures