Table of Contents
Fetching ...

ZeST: Zero-Shot Material Transfer from a Single Image

Ta-Ying Cheng, Prafull Sharma, Andrew Markham, Niki Trigoni, Varun Jampani

TL;DR

ZeST addresses the problem of exemplar-based material transfer in 2D without training or explicit 3D geometry by leveraging a three-branch diffusion-guided pipeline: material encoding to obtain a latent $z_M$, depth-based geometry guidance via ControlNet to preserve input geometry, and latent illumination guidance through inpainting with a grayscale foreground initialization. The method injects $z_M$ into a pre-trained inpainting diffusion model through cross-attention, resulting in $I_{gen}$ that combines the exemplar’s material with the input’s geometry and lighting. Evaluations on real and synthetic datasets show improved material fidelity and photorealism over strong baselines while remaining entirely training-free, and the framework supports multiple object edits and lighting-aware variations. ZeST thus provides a scalable, practical tool for artists and graphics pipelines, with potential extensions to exemplar-based 3D texturing and relighting in untextured mesh renderings.

Abstract

We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues. The method works on real images without any training resulting a zero-shot approach. Both qualitative and quantitative results on real and synthetic datasets demonstrate that ZeST outputs photorealistic images with transferred materials. We also show the application of ZeST to perform multiple edits and robust material assignment under different illuminations. Project Page: https://ttchengab.github.io/zest

ZeST: Zero-Shot Material Transfer from a Single Image

TL;DR

ZeST addresses the problem of exemplar-based material transfer in 2D without training or explicit 3D geometry by leveraging a three-branch diffusion-guided pipeline: material encoding to obtain a latent , depth-based geometry guidance via ControlNet to preserve input geometry, and latent illumination guidance through inpainting with a grayscale foreground initialization. The method injects into a pre-trained inpainting diffusion model through cross-attention, resulting in that combines the exemplar’s material with the input’s geometry and lighting. Evaluations on real and synthetic datasets show improved material fidelity and photorealism over strong baselines while remaining entirely training-free, and the framework supports multiple object edits and lighting-aware variations. ZeST thus provides a scalable, practical tool for artists and graphics pipelines, with potential extensions to exemplar-based 3D texturing and relighting in untextured mesh renderings.

Abstract

We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues. The method works on real images without any training resulting a zero-shot approach. Both qualitative and quantitative results on real and synthetic datasets demonstrate that ZeST outputs photorealistic images with transferred materials. We also show the application of ZeST to perform multiple edits and robust material assignment under different illuminations. Project Page: https://ttchengab.github.io/zest
Paper Structure (17 sections, 2 equations, 11 figures, 1 table)

This paper contains 17 sections, 2 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Overview. We present ZeST, a zero-shot single-image approach to (a) transfer material from an examplar image to an object in the input image. (b)ZeST can easily be extended to perform multiple material edits in an single image, and (c) perform implicit lighting-aware edits on rendering of a textured mesh.
  • Figure 2: ZeST Architecture. Given a material exemplar $M$ and an input image $I$, we first encode material exemplar with an image encoder (e.g., IP-Adaptor). Concurrently, we convert the input image into a depth map $D_I$ and a foreground-grayscaled image $I_{init}$to feed into the geometry and latent illumination guidance branch, respectively. By combining the two sources of guidance with the latent features from the material encoding, ZeST can transfer the material properties onto the object in input image while preserving all other attributes.
  • Figure 3: The design choice of IP-Adaptor with ControlNet. Given the material exemplar and the input image, we dive into the different choices of utilizing the IP-Adaptor. In particular we realize that an Img2Img + text module (a) wouldn't properly transfer the materials properly to the main object. On the other hand, ControlNet (b) will preserve the geometry information of the given input. We thus utilize this as the starting point for geometry guidance to further explore the best illumination cues.
  • Figure 4: Ablating input for illumination guidance. To validate our design choice of the foreground-grayscale image for initializing inpainting, we compare the generated results against using the original image and random noise as inputs. The original image presents a strong base color prior that perturbs the generation, while the random image neglects shading information, leading to wrong lighting in both examples.
  • Figure 5: Qualitative results on diverse materials. We present results of material transfer from a diverse set of material exemplar images. Even when perturbed by lighting and complex geometry, ZeST can still isolate the material information from the exemplar image and transfer to various objects while preserving the original geometry and illumination conditions. Note the change in specular regions as shinier materials are chosen in the case of the car made of brass and the dinosaur made of shiny steel.
  • ...and 6 more figures