Table of Contents
Fetching ...

Refaçade: Editing Object with Given Reference Texture

Youze Huang, Penghui Ruan, Bojia Zi, Xianbiao Qi, Jianan Wang, Rong Xiao

TL;DR

First, the paper defines Object Retexture, a diffusion-based editing task to transfer texture from a reference object to a target object while preserving the target's geometry. It introduces Refaçade, a two-pronged framework using a texture remover to produce geometry-only conditioning and a jigsaw permutation to break reference structure, ensuring texture-driven transfer. The method is trained in two stages with large-scale data and distilled texture remover for fast inference, and it achieves state-of-the-art results on both image and video benchmarks, with extensive automatic and human evaluations. The work advances controllable texture editing with robust generalization across shapes and motions.

Abstract

Recent advances in diffusion models have brought remarkable progress in image and video editing, yet some tasks remain underexplored. In this paper, we introduce a new task, Object Retexture, which transfers local textures from a reference object to a target object in images or videos. To perform this task, a straightforward solution is to use ControlNet conditioned on the source structure and the reference texture. However, this approach suffers from limited controllability for two reasons: conditioning on the raw reference image introduces unwanted structural information, and it fails to disentangle the visual texture and structure information of the source. To address this problem, we propose Refaçade, a method that consists of two key designs to achieve precise and controllable texture transfer in both images and videos. First, we employ a texture remover trained on paired textured/untextured 3D mesh renderings to remove appearance information while preserving the geometry and motion of source videos. Second, we disrupt the reference global layout using a jigsaw permutation, encouraging the model to focus on local texture statistics rather than the global layout of the object. Extensive experiments demonstrate superior visual quality, precise editing, and controllability, outperforming strong baselines in both quantitative and human evaluations. Code is available at https://github.com/fishZe233/Refacade.

Refaçade: Editing Object with Given Reference Texture

TL;DR

First, the paper defines Object Retexture, a diffusion-based editing task to transfer texture from a reference object to a target object while preserving the target's geometry. It introduces Refaçade, a two-pronged framework using a texture remover to produce geometry-only conditioning and a jigsaw permutation to break reference structure, ensuring texture-driven transfer. The method is trained in two stages with large-scale data and distilled texture remover for fast inference, and it achieves state-of-the-art results on both image and video benchmarks, with extensive automatic and human evaluations. The work advances controllable texture editing with robust generalization across shapes and motions.

Abstract

Recent advances in diffusion models have brought remarkable progress in image and video editing, yet some tasks remain underexplored. In this paper, we introduce a new task, Object Retexture, which transfers local textures from a reference object to a target object in images or videos. To perform this task, a straightforward solution is to use ControlNet conditioned on the source structure and the reference texture. However, this approach suffers from limited controllability for two reasons: conditioning on the raw reference image introduces unwanted structural information, and it fails to disentangle the visual texture and structure information of the source. To address this problem, we propose Refaçade, a method that consists of two key designs to achieve precise and controllable texture transfer in both images and videos. First, we employ a texture remover trained on paired textured/untextured 3D mesh renderings to remove appearance information while preserving the geometry and motion of source videos. Second, we disrupt the reference global layout using a jigsaw permutation, encouraging the model to focus on local texture statistics rather than the global layout of the object. Extensive experiments demonstrate superior visual quality, precise editing, and controllability, outperforming strong baselines in both quantitative and human evaluations. Code is available at https://github.com/fishZe233/Refacade.

Paper Structure

This paper contains 33 sections, 3 equations, 17 figures, 11 tables.

Figures (17)

  • Figure 1: Visual results of Refaçade on videos. Best viewed with Adobe Acrobat Reader; click to play.
  • Figure 2: Visual results of Refaçade on images.
  • Figure 3: The framework of our Refaçade. The training pipeline of Refaçade is shown on the left, and the model architecture is presented on the right.
  • Figure 4: Our data construction pipeline for the texture remover operates as follows: we collect object images, reconstruct 3D meshes, and render paired videos with and without textures under diverse camera trajectories and object motions.
  • Figure 5: Visualization of Jigsaw Permutation. We extract foreground patches from the reference image on the top-left corner, shuffle and flip them randomly, then rearrange them into a new layout. This destroys global spatial structure while preserving local texture patterns.
  • ...and 12 more figures