Table of Contents
Fetching ...

Directional Texture Editing for 3D Models

Shengqi Liu, Zhuo Chen, Jingnan Gao, Yichao Yan, Wenhan Zhu, Jiangjing Lyu, Xiaokang Yang

TL;DR

This work proposes ITEM3D, a Texture Editing Model designed for automatic 3D object editing according to the text Instructions, which takes the rendered images as the bridge between text and 3D representation and further optimizes the disentangled texture and environment map.

Abstract

Texture editing is a crucial task in 3D modeling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task. To address this challenge, we propose ITEM3D, a \textbf{T}exture \textbf{E}diting \textbf{M}odel designed for automatic \textbf{3D} object editing according to the text \textbf{I}nstructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation, and further optimizes the disentangled texture and environment map. Previous methods adopted the absolute editing direction namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in the noisy appearance and text inconsistency. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state-of-the-art methods on various 3D objects. We also perform text-guided relighting to show explicit control over lighting. Our project page: https://shengqiliu1.github.io/ITEM3D.

Directional Texture Editing for 3D Models

TL;DR

This work proposes ITEM3D, a Texture Editing Model designed for automatic 3D object editing according to the text Instructions, which takes the rendered images as the bridge between text and 3D representation and further optimizes the disentangled texture and environment map.

Abstract

Texture editing is a crucial task in 3D modeling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task. To address this challenge, we propose ITEM3D, a \textbf{T}exture \textbf{E}diting \textbf{M}odel designed for automatic \textbf{3D} object editing according to the text \textbf{I}nstructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation, and further optimizes the disentangled texture and environment map. Previous methods adopted the absolute editing direction namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in the noisy appearance and text inconsistency. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state-of-the-art methods on various 3D objects. We also perform text-guided relighting to show explicit control over lighting. Our project page: https://shengqiliu1.github.io/ITEM3D.
Paper Structure (20 sections, 15 equations, 10 figures, 1 table)

This paper contains 20 sections, 15 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Motivation. (a) Previous methods DBLP:journals/corr/abs-2209-14988chen2023fantasia3d with SDS loss to directly guide the optimization leads to ambiguous details due to the bias between texts and images (red line), while our method introduces the relative direction between source and target texts to the optimization process, eliminating the bias and improving the rendering results (green line). (b) The optimization in the texture domain gives rise to the deviation of the target direction (red line), thus we gradually adjust the direction to fine-tune the optimization (fold green line).
  • Figure 2: Pipeline of 3D model texture editing. We render the 3D model with mesh, texture, and environment map into 2D images which are then added with noise $\epsilon$. We further separately use the target text and the gradually adjusted source text as the conditions to predict the added noise via the U-Net. The difference between the two predicted noises serve as the relative direction to guide the optimization of the materials and environment map.
  • Figure 3: Performance on real-world objects. ITEM3D successfully transform the original cat toy into a vegetable tiger toy, the piggy doll into a porcelain pig, and the sneaker into a golden sneaker with remarkable quality. In contrast, Text2Tex chen2023text2tex and TEXTure richardson2023texture both generate noisy textures resulting in low-quality rendering appearance. Instruct-NeRF2NeRF haque2023instruct achieves natural and text-consistent appearance, but it fails to edit the material of the objects. It is noted that the text instructions given to the Instruct-NeRF2NeRF are slightly different from other methods, i.e., "Make it into a vegetable tiger toy", "Make it into a porcelain piggy toy" and "Make it into a golden sneaker", due to its special requirement.
  • Figure 4: Qualitative comparison on NeRF synthetic dataset. We conducted an analysis of our method with the state-of-the-art approach, Instruct-NeRF2NeRF (IN2N) haque2023instruct and the simple SDS-based method. While both ITEM3D and IN2N demonstrate prompt-consistent editing, the SDS-based method fails to yield satisfactory outcomes. Conversely, IN2N exhibits a loss of the original chair patterns and an inability to faithfully represent natural wood patterns. Moreover, it lacks the necessary precision to accurately discern the edited object.
  • Figure 5: Ablation study of direction adjustment. The results without adjustment show a wired appearance, i.e., a pale body of the cattle and dual heads of the duck. However, with the application of gradual adjustment, the unrealistic artifacts are released, leading to a natural appearance.
  • ...and 5 more figures