Table of Contents
Fetching ...

CaliTex: Geometry-Calibrated Attention for View-Coherent 3D Texture Generation

Chenyu Liu, Hongze Chen, Jingzhi Bao, Lingting Zhu, Runze Zhang, Weikai Chen, Zeyu Hu, Yingda Yin, Keyang Luo, Xin Wang

TL;DR

This work tackles cross-view texture inconsistencies in diffusion-based 3D texture generation by diagnosing attention ambiguity between geometry, reference images, and noise tokens. It proposes CaliTex, a geometry-calibrated attention framework with Part-Aligned Attention and Condition-Routed Attention embedded in a two-stage diffusion transformer to encode geometric priors into the generation process. The approach yields seamless, view-consistent textures and outperforms open-source and commercial baselines across quantitative metrics and user studies. Ablation studies confirm that both PAA and CRA are essential for reducing cross-view misalignment and preventing cross-modal copying.

Abstract

Despite major advances brought by diffusion-based models, current 3D texture generation systems remain hindered by cross-view inconsistency -- textures that appear convincing from one viewpoint often fail to align across others. We find that this issue arises from attention ambiguity, where unstructured full attention is applied indiscriminately across tokens and modalities, causing geometric confusion and unstable appearance-structure coupling. To address this, we introduce CaliTex, a framework of geometry-calibrated attention that explicitly aligns attention with 3D structure. It introduces two modules: Part-Aligned Attention that enforces spatial alignment across semantically matched parts, and Condition-Routed Attention which routes appearance information through geometry-conditioned pathways to maintain spatial fidelity. Coupled with a two-stage diffusion transformer, CaliTex makes geometric coherence an inherent behavior of the network rather than a byproduct of optimization. Empirically, CaliTex produces seamless and view-consistent textures and outperforms both open-source and commercial baselines.

CaliTex: Geometry-Calibrated Attention for View-Coherent 3D Texture Generation

TL;DR

This work tackles cross-view texture inconsistencies in diffusion-based 3D texture generation by diagnosing attention ambiguity between geometry, reference images, and noise tokens. It proposes CaliTex, a geometry-calibrated attention framework with Part-Aligned Attention and Condition-Routed Attention embedded in a two-stage diffusion transformer to encode geometric priors into the generation process. The approach yields seamless, view-consistent textures and outperforms open-source and commercial baselines across quantitative metrics and user studies. Ablation studies confirm that both PAA and CRA are essential for reducing cross-view misalignment and preventing cross-modal copying.

Abstract

Despite major advances brought by diffusion-based models, current 3D texture generation systems remain hindered by cross-view inconsistency -- textures that appear convincing from one viewpoint often fail to align across others. We find that this issue arises from attention ambiguity, where unstructured full attention is applied indiscriminately across tokens and modalities, causing geometric confusion and unstable appearance-structure coupling. To address this, we introduce CaliTex, a framework of geometry-calibrated attention that explicitly aligns attention with 3D structure. It introduces two modules: Part-Aligned Attention that enforces spatial alignment across semantically matched parts, and Condition-Routed Attention which routes appearance information through geometry-conditioned pathways to maintain spatial fidelity. Coupled with a two-stage diffusion transformer, CaliTex makes geometric coherence an inherent behavior of the network rather than a byproduct of optimization. Empirically, CaliTex produces seamless and view-consistent textures and outperforms both open-source and commercial baselines.

Paper Structure

This paper contains 24 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A collection of 3D objects textured by our method, demonstrating high-fidelity, seamless and geometry-aligned textures facilitated by our framework with geometry-calibrated attention. Visit our project website at https://calitex-project.github.io.
  • Figure 2: Illustration issues caused by attention ambiguity and our proposed solutions. Zoom in for more details. (a) The model confuses the left limb in the second view with the right limb, producing seams in the texture. (b) Our Part-Aligned Attention constrains attention computation within semantic parts, effectively eliminating cross-view inconsistency. (c) The model directly copies visually similar regions from the reference image, leading to misalignment with the geometry condition. (d) Our Condition-Routed Attention ensures geometry-aligned texture generation, correcting the distortion on the clothing, as highlighted in the bottom-right.
  • Figure 3: Overview of our method. (a) We employ a two-stage generation framework: the Single-View DiT captures intra-view correlations, while the Multi-View DiT enhances geometric alignment and cross-view consistency using (b) Condition-Routed Attention and (c) Part-Aligned Attention. The generated multi-view images are then projected back and inpainted to produce the final 3D texture.
  • Figure 4: Qualitative comparison with recent methods. We compare our approach with both open-source and commercial models (marked with *) on various objects. Regions highlighted in yellow indicate seams or cross-view inconsistencies, while regions highlighted in blue denote misalignment with the underlying geometry. Please zoom in for more details. Meshes are generated by Hunyuan3D-3.0.
  • Figure 5: Ablation study of Part-Aligned Attention. Without Part-Aligned Attention, ambiguous cross-view attention causes incorrect alignment across views, while our method yields correct results.
  • ...and 1 more figures