Table of Contents
Fetching ...

TexEditor: Structure-Preserving Text-Driven Texture Editing

Bo Zhao, Yihang Liu, Chenfeng Zhang, Huan Yang, Kun Gai, Wei Ji

Abstract

Text-guided texture editing aims to modify object appearance while preserving the underlying geometric structure. However, our empirical analysis reveals that even SOTA editing models frequently struggle to maintain structural consistency during texture editing, despite the intended changes being purely appearance-related. Motivated by this observation, we jointly enhance structure preservation from both data and training perspectives, and build TexEditor, a dedicated texture editing model based on Qwen-Image-Edit-2509. Firstly, we construct TexBlender, a high-quality SFT dataset generated with Blender, which provides strong structural priors for a cold start. Sec- ondly, we introduce StructureNFT, a RL-based approach that integrates structure-preserving losses to transfer the structural priors learned during SFT to real-world scenes. Moreover, due to the limited realism and evaluation coverage of existing benchmarks, we introduce TexBench, a general-purpose real-world benchmark for text-guided texture editing. Extensive experiments on existing Blender-based texture benchmarks and our TexBench show that TexEditor consistently outperforms strong baselines such as Nano Banana Pro. In addition, we assess TexEditor on the general purpose benchmark ImgEdit to validate its generalization. Our code and data are available at https://github.com/KlingAIResearch/TexEditor.

TexEditor: Structure-Preserving Text-Driven Texture Editing

Abstract

Text-guided texture editing aims to modify object appearance while preserving the underlying geometric structure. However, our empirical analysis reveals that even SOTA editing models frequently struggle to maintain structural consistency during texture editing, despite the intended changes being purely appearance-related. Motivated by this observation, we jointly enhance structure preservation from both data and training perspectives, and build TexEditor, a dedicated texture editing model based on Qwen-Image-Edit-2509. Firstly, we construct TexBlender, a high-quality SFT dataset generated with Blender, which provides strong structural priors for a cold start. Sec- ondly, we introduce StructureNFT, a RL-based approach that integrates structure-preserving losses to transfer the structural priors learned during SFT to real-world scenes. Moreover, due to the limited realism and evaluation coverage of existing benchmarks, we introduce TexBench, a general-purpose real-world benchmark for text-guided texture editing. Extensive experiments on existing Blender-based texture benchmarks and our TexBench show that TexEditor consistently outperforms strong baselines such as Nano Banana Pro. In addition, we assess TexEditor on the general purpose benchmark ImgEdit to validate its generalization. Our code and data are available at https://github.com/KlingAIResearch/TexEditor.
Paper Structure (39 sections, 9 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 39 sections, 9 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: Existing image editing models often struggle to preserve structural consistency with the original image during texture editing, and instead tend to regenerate the edited subject.
  • Figure 2: The training pipeline of TexEditor. (a) The model undergoes Supervised Fine-Tuning (SFT) on a synthetic dataset rendering 3D assets in Blender. (b) It is then optimized via Reinforcement Learning (RL), utilizing a multi-faceted reward system that incorporates Gemini for instruction alignment and structural analysis to balance texture editing with structural preservation.
  • Figure 3: Generation of image-instruction pairs in TexBlender. We first identify target object groups within filtered 3D scenes. Visual variations are created via (b1) Attribute Editing (adjusting shader parameters) or (b2) Texture Replacement (applying MatSynth textures). Finally, (b3) an Instruction Refiner leverages visual cues (difference images) and segmentation masks (SAM3) to generate precise, grounded text instructions that accurately describe the texture changes.
  • Figure 4: Visualization of different structure extractors and distance metrics in structure consistency loss
  • Figure 5: Qualitative Comparison of Different Evaluation Metrics. Results favored by Gemini, structural, and TexEval scores are shown in green, blue, and red; check marks indicate human preference. TexEval balances the structure and semantics and aligns best with humans.
  • ...and 9 more figures