Table of Contents
Fetching ...

Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer

Kento Masui, Mayu Otani, Masahiro Nomura, Hideki Nakayama

TL;DR

A training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM) that enables style transfer in the latent space of LDM for reduced computational cost, and provides compatibility for various LDM models.

Abstract

Diffusion models have recently shown the ability to generate high-quality images. However, controlling its generation process still poses challenges. The image style transfer task is one of those challenges that transfers the visual attributes of a style image to another content image. Typical obstacle of this task is the requirement of additional training of a pre-trained model. We propose a training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM). Our algorithm employs Adaptive Instance Normalization (AdaIN) function in a distinct manner during the reverse diffusion process of an LDM while tracking the encoding history of the style image. This algorithm enables style transfer in the latent space of LDM for reduced computational cost, and provides compatibility for various LDM models. Through a series of experiments and a user study, we show that our method can quickly transfer the style of an image without additional training. The speed, compatibility, and training-free aspect of our algorithm facilitates agile experiments with combinations of styles and LDMs for extensive application.

Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer

TL;DR

A training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM) that enables style transfer in the latent space of LDM for reduced computational cost, and provides compatibility for various LDM models.

Abstract

Diffusion models have recently shown the ability to generate high-quality images. However, controlling its generation process still poses challenges. The image style transfer task is one of those challenges that transfers the visual attributes of a style image to another content image. Typical obstacle of this task is the requirement of additional training of a pre-trained model. We propose a training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM). Our algorithm employs Adaptive Instance Normalization (AdaIN) function in a distinct manner during the reverse diffusion process of an LDM while tracking the encoding history of the style image. This algorithm enables style transfer in the latent space of LDM for reduced computational cost, and provides compatibility for various LDM models. Through a series of experiments and a user study, we show that our method can quickly transfer the style of an image without additional training. The speed, compatibility, and training-free aspect of our algorithm facilitates agile experiments with combinations of styles and LDMs for extensive application.
Paper Structure (31 sections, 4 equations, 8 figures, 3 tables)

This paper contains 31 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Our image style transfer results. Our algorithm is able to transfer the visual style to a content image using a pre-trained latent diffusion model, without the need for additional training or heavy optimization. Unlike most existing approaches, our method preserves the original color of the content.
  • Figure 2: An architecture of our image style transfer with style-tracking reverse diffusion process. We first add noises to the latent variables of the style and content image for $\mathrm{T}'$ steps. We keep a history of latent variables from the style image's forward diffusion process as $z_{s,t}$. In the reverse diffusion steps, we gather the CNN filter activation statistics in $\epsilon_\theta$ from style, and transfer them to the corresponding content activations using AdaIN. This scheme allows us to transfer the image's style without training any module. We also visualize latent variables $z_t$ and predicted noise $\hat{\epsilon}_t$ involved in this architecture as colored images. We further show a detailed diagram of $\tilde{\epsilon}_\theta$ in \ref{['fig: unet']}
  • Figure 3: A diagram of $\tilde{\epsilon}_\theta$ which repeatedly applies AdaIN during the forward pass of denoising U-Net. AdaIN is introduced to every convolutional layer to transfer filter activation statistics from a style image.
  • Figure 4: Qualitative comparisons of stylized images by baseline methods and ours. Our method has a texture transfer effect while preserving the color of a content image.
  • Figure 5: Visualization of the effects from $S$. The style effect becomes more apparent as we increase $S$. We can see a trade-off between the style effect and deformation. This is due to the increased reverse diffusion steps by $S$ in the LDM.
  • ...and 3 more figures