Table of Contents
Fetching ...

RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models

Jangyeong Kim, Donggoo Kang, Junyoung Choi, Jeonga Wi, Junho Gwon, Jiun Bae, Dumim Yoon, Junghyun Han

TL;DR

This paper proposes a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh that leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures.

Abstract

Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures. The method also employs a symmetrical view synthesis strategy combined with regional prompts for enhancing view consistency. Additionally, it introduces novel texture blending and soft-inpainting techniques, which significantly reduce the seam regions. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods.

RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models

TL;DR

This paper proposes a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh that leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures.

Abstract

Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures. The method also employs a symmetrical view synthesis strategy combined with regional prompts for enhancing view consistency. Additionally, it introduces novel texture blending and soft-inpainting techniques, which significantly reduce the seam regions. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods.
Paper Structure (29 sections, 4 equations, 13 figures, 2 tables)

This paper contains 29 sections, 4 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: This paper presents RoCoTex, a diffusion-based text-to-texture generation method that addresses the challenges of synthesizing view-consistent, well-aligned, seamless, and high-quality textures.
  • Figure 2: 2D diffusion-based texturing poses several challenges: (a) There exist inconsistencies between the front and back views; The textured character suffers from the Janus problem. (b) The texture is not aligned with the underlying mesh. (c) On the textured surfaces are many artifacts including seams; The left images are generated by Text2Tex, and the right by TEXTure.
  • Figure 3: Overview of RoCoTex: The concatenated image ${I}_{ij}$, its inpainting mask ${M}_{ij}$, the depth map $D_{ij}$, the normal map $N_{ij}$, the edge map $E_{ij}$, and the SDXL output ${\hat{I}}_{ij}$ are of the same size, whereas the local confidence maps $C_i$ and $C_j$, the local textures $T_i$ and $T_j$, the global confidence map $C^*$ and the global texture $T^*$ are of the same size.
  • Figure 4: In the confidence map, $C_i$, the pixels located on the oblique triangles are given low confidences.
  • Figure 5: Gaussian blurring of inpainting mask: (a) This shows the left part of $M_{ij}$, i.e., $M_i$. (b) $M_i$ is blurred.
  • ...and 8 more figures