Table of Contents
Fetching ...

DogWeave: High-Fidelity 3D Canine Reconstruction from a Single Image via Normal Fusion and Conditional Inpainting

Shufan Sun, Chenchen Wang, Zongfu Yu

TL;DR

DogWeave is proposed, a model-based framework for reconstructing high-fidelity 3D canine models from a single RGB image and outperforms state-of-the-art single image to 3d reconstruction methods in both shape accuracy and texture realism for canines.

Abstract

Monocular 3D animal reconstruction is challenging due to complex articulation, self-occlusion, and fine-scale details such as fur. Existing methods often produce distorted geometry and inconsistent textures due to the lack of articulated 3D supervision and limited availability of back-view images in 2D datasets, which makes reconstructing unobserved regions particularly difficult. To address these limitations, we propose DogWeave, a model-based framework for reconstructing high-fidelity 3D canine models from a single RGB image. DogWeave improves geometry by refining a coarsely-initiated parametric mesh into a detailed SDF representation through multi-view normal field optimization using diffusion-enhanced normals. It then generates view-consistent textures through conditional partial inpainting guided by structure and style cues, enabling realistic reconstruction of unobserved regions. Using only about 7,000 dog images processed via our 2D pipeline for training, DogWeave produces complete, realistic 3D models and outperforms state-of-the-art single image to 3d reconstruction methods in both shape accuracy and texture realism for canines.

DogWeave: High-Fidelity 3D Canine Reconstruction from a Single Image via Normal Fusion and Conditional Inpainting

TL;DR

DogWeave is proposed, a model-based framework for reconstructing high-fidelity 3D canine models from a single RGB image and outperforms state-of-the-art single image to 3d reconstruction methods in both shape accuracy and texture realism for canines.

Abstract

Monocular 3D animal reconstruction is challenging due to complex articulation, self-occlusion, and fine-scale details such as fur. Existing methods often produce distorted geometry and inconsistent textures due to the lack of articulated 3D supervision and limited availability of back-view images in 2D datasets, which makes reconstructing unobserved regions particularly difficult. To address these limitations, we propose DogWeave, a model-based framework for reconstructing high-fidelity 3D canine models from a single RGB image. DogWeave improves geometry by refining a coarsely-initiated parametric mesh into a detailed SDF representation through multi-view normal field optimization using diffusion-enhanced normals. It then generates view-consistent textures through conditional partial inpainting guided by structure and style cues, enabling realistic reconstruction of unobserved regions. Using only about 7,000 dog images processed via our 2D pipeline for training, DogWeave produces complete, realistic 3D models and outperforms state-of-the-art single image to 3d reconstruction methods in both shape accuracy and texture realism for canines.
Paper Structure (19 sections, 9 equations, 7 figures, 2 tables)

This paper contains 19 sections, 9 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: DogWeave reconstructs photorealistic 3D dog models from single images, achieving state-of-the-art texture fidelity and geometry-texture coherence.
  • Figure 2: DogWeave reconstructs high-fidelity 3D dog models from a single RGB image using a three-stage pipeline. Stage 1: Coarse Shape Initialization generates a base mesh with BITE rueegg2023bite and refines overall proportions. Stage 2: Surface Detail Enhancement converts the mesh to a volumetric SDF and incorporates diffusion-enhanced multi-view normals for fine geometric details. Stage 3: Sequential Texturing produces photorealistic, identity-consistent appearance through style- and breed-conditioned inpainting.
  • Figure 3: Normal optimization. We progressively refine geometry from the BITE base mesh (light arrows) and through SDF optimization with multi-view normal fusion (dark arrows) to recover fine-scale surface details. The right column compares to CraftsMan3Dli2024craftsman, which exhibits domain drift when species constraints are not applied.
  • Figure 4: Breed Information Visualizations. Upper left: breed information guides the texturing pipeline. Lower left: improved facial feature precision when breed info for this Toy Terrier is included. Right: robust generalization to unseen breeds.
  • Figure 5: Qualitative comparison across breeds and poses.Faunali2024fauna produces coarse textures. CRMwang2024crm and TripoSRTripoSR2024 show color drift and topology errors. Wonder3Dlong2023wonder3d exhibits geometry bias. Trellistrellis and SAM3Dsam3dteam2025sam3d3dfyimages show similar artifacts. Hunyuan3Dhunyuan3d22025tencentyang2024hunyuan3dlai2025hunyuan3d25highfidelity3d generates good geometry but blurred textures in occluded regions.
  • ...and 2 more figures