Table of Contents
Fetching ...

TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Yan Zeng, Haoran Jiang, Kaixin Yao, Qixuan Zhang, Longwen Zhang, Lan Xu, Jingyi Yu

Abstract

Automatically generating photorealistic and self-consistent appearances for untextured 3D models is a critical challenge in digital content creation. The advancement of large-scale video generation models offers a natural approach: directly synthesizing 360-degree turntable videos (TTVs), which can serve not only as high-quality dynamic previews but also as an intermediate representation to drive texture synthesis and neural rendering. However, existing general-purpose video diffusion models struggle to maintain strict geometric consistency and appearance stability across the full range of views, making their outputs ill-suited for high-quality 3D reconstruction. To this end, we introduce TAPESTRY, a framework for generating high-fidelity TTVs conditioned on explicit 3D geometry. We reframe the 3D appearance generation task as a geometry-conditioned video diffusion problem: given a 3D mesh, we first render and encode multi-modal geometric features to constrain the video generation process with pixel-level precision, thereby enabling the creation of high-quality and consistent TTVs. Building upon this, we also design a method for downstream reconstruction tasks from the TTV input, featuring a multi-stage pipeline with 3D-Aware Inpainting. By rotating the model and performing a context-aware secondary generation, this pipeline effectively completes self-occluded regions to achieve full surface coverage. The videos generated by TAPESTRY are not only high-quality dynamic previews but also serve as a reliable, 3D-aware intermediate representation that can be seamlessly back-projected into UV textures or used to supervise neural rendering methods like 3DGS. This enables the automated creation of production-ready, complete 3D assets from untextured meshes. Experimental results demonstrate that our method outperforms existing approaches in both video consistency and final reconstruction quality.

TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Abstract

Automatically generating photorealistic and self-consistent appearances for untextured 3D models is a critical challenge in digital content creation. The advancement of large-scale video generation models offers a natural approach: directly synthesizing 360-degree turntable videos (TTVs), which can serve not only as high-quality dynamic previews but also as an intermediate representation to drive texture synthesis and neural rendering. However, existing general-purpose video diffusion models struggle to maintain strict geometric consistency and appearance stability across the full range of views, making their outputs ill-suited for high-quality 3D reconstruction. To this end, we introduce TAPESTRY, a framework for generating high-fidelity TTVs conditioned on explicit 3D geometry. We reframe the 3D appearance generation task as a geometry-conditioned video diffusion problem: given a 3D mesh, we first render and encode multi-modal geometric features to constrain the video generation process with pixel-level precision, thereby enabling the creation of high-quality and consistent TTVs. Building upon this, we also design a method for downstream reconstruction tasks from the TTV input, featuring a multi-stage pipeline with 3D-Aware Inpainting. By rotating the model and performing a context-aware secondary generation, this pipeline effectively completes self-occluded regions to achieve full surface coverage. The videos generated by TAPESTRY are not only high-quality dynamic previews but also serve as a reliable, 3D-aware intermediate representation that can be seamlessly back-projected into UV textures or used to supervise neural rendering methods like 3DGS. This enables the automated creation of production-ready, complete 3D assets from untextured meshes. Experimental results demonstrate that our method outperforms existing approaches in both video consistency and final reconstruction quality.
Paper Structure (9 sections, 1 equation, 6 figures, 3 tables)

This paper contains 9 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: We introduce TAPESTRY for high-fidelity 3D appearance generation by synthesizing a reconstructable Turntable Video through strong geometric conditioning. This highly consistent video then serves as a robust data source to create a final, high-quality asset.
  • Figure 2: An overview of the TAPESTRY architecture. (a) Geometry-guided video generation. Our method generates a 3D consistent Turntable Video by injecting multi-modal geometric conditions and reference context into a DiT-based video diffusion model. (b) Our progressive texturing pipeline. We iteratively generate TTVs from new, optimized viewpoints and fuse their projections via Texture Baking. Each pass is conditioned on previously generated textures to ensure global consistency, continuing until full coverage is achieved.
  • Figure 3: Qualitative results of TAPESTRY. Our method generates a consistent Turntable Video, from which a complete and high-fidelity textured asset is produced.
  • Figure 4: Qualitative comparison. As shown, the controllable video baselines suffer from significant appearance drift, the Janus problem. In contrast, our method maintains strong object identity and consistency throughout the rotation, effectively eliminating such artifacts.
  • Figure 6: The initial Iter.1 result has an incomplete texture due to self-occlusion. The w/o Inpainting baseline that generates a second TTV independently suffers from visible seams and color shifts.In contrast, our full pipeline with context-aware inpainting produces a seamless and globally consistent result.
  • ...and 1 more figures