Table of Contents
Fetching ...

Dynamic Texture Transfer using PatchMatch and Transformers

Guo Pu, Shiyao Xu, Xixin Cao, Zhouhui Lian

TL;DR

The paper tackles one-shot dynamic texture transfer, aiming to animate a target image with textures from a source video while preserving structure. It proposes DynTexture, a two-stage approach: first, initial frame synthesis via distance-map guided PatchMatch; second, subsequent frame prediction using patch-cutting, VQ-VAE encoding, and Transformer-based long-sequence prediction, with Gaussian patch merging. Key contributions include distance-map guidance for PatchMatch, a patch-based, structure-agnostic sequence forecasting pipeline, and efficient discrete latent modeling via VQ-VAE coupled with Transformer prediction to achieve temporally coherent frames. The approach yields superior results on dynamic text effects transfer and moving texture scenarios, and its modular design enables broader applicability to layout modification and image animation tasks.

Abstract

How to automatically transfer the dynamic texture of a given video to the target still image is a challenging and ongoing problem. In this paper, we propose to handle this task via a simple yet effective model that utilizes both PatchMatch and Transformers. The key idea is to decompose the task of dynamic texture transfer into two stages, where the start frame of the target video with the desired dynamic texture is synthesized in the first stage via a distance map guided texture transfer module based on the PatchMatch algorithm. Then, in the second stage, the synthesized image is decomposed into structure-agnostic patches, according to which their corresponding subsequent patches can be predicted by exploiting the powerful capability of Transformers equipped with VQ-VAE for processing long discrete sequences. After getting all those patches, we apply a Gaussian weighted average merging strategy to smoothly assemble them into each frame of the target stylized video. Experimental results demonstrate the effectiveness and superiority of the proposed method in dynamic texture transfer compared to the state of the art.

Dynamic Texture Transfer using PatchMatch and Transformers

TL;DR

The paper tackles one-shot dynamic texture transfer, aiming to animate a target image with textures from a source video while preserving structure. It proposes DynTexture, a two-stage approach: first, initial frame synthesis via distance-map guided PatchMatch; second, subsequent frame prediction using patch-cutting, VQ-VAE encoding, and Transformer-based long-sequence prediction, with Gaussian patch merging. Key contributions include distance-map guidance for PatchMatch, a patch-based, structure-agnostic sequence forecasting pipeline, and efficient discrete latent modeling via VQ-VAE coupled with Transformer prediction to achieve temporally coherent frames. The approach yields superior results on dynamic text effects transfer and moving texture scenarios, and its modular design enables broader applicability to layout modification and image animation tasks.

Abstract

How to automatically transfer the dynamic texture of a given video to the target still image is a challenging and ongoing problem. In this paper, we propose to handle this task via a simple yet effective model that utilizes both PatchMatch and Transformers. The key idea is to decompose the task of dynamic texture transfer into two stages, where the start frame of the target video with the desired dynamic texture is synthesized in the first stage via a distance map guided texture transfer module based on the PatchMatch algorithm. Then, in the second stage, the synthesized image is decomposed into structure-agnostic patches, according to which their corresponding subsequent patches can be predicted by exploiting the powerful capability of Transformers equipped with VQ-VAE for processing long discrete sequences. After getting all those patches, we apply a Gaussian weighted average merging strategy to smoothly assemble them into each frame of the target stylized video. Experimental results demonstrate the effectiveness and superiority of the proposed method in dynamic texture transfer compared to the state of the art.
Paper Structure (18 sections, 7 equations, 11 figures, 2 tables)

This paper contains 18 sections, 7 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: An example of dynamic texture transfer. Given a sample video and a target still image, the proposed method is able to synthesize the target video by transferring the dynamic texture of the sample video into the target image.
  • Figure 2: Utilizing distance information to guide the PatchMatch algorithm, letting the flow of information outward from the boundary.
  • Figure 3: Overview of the proposed DynTexture, which is designed as a two-stage architecture, where the distance map guided texture rendering module generates the initial frame, and the novel deep sequence forecasting module predicts and synthesizes the subsequent frames based on the previously-synthesized initial frame.
  • Figure 4: Comparison between simple average and Gaussian weighted average merging strategies. Gaussian weighted average obviously obtains higher-quality results.
  • Figure 5: Our results on flame dynamic effects transfer.
  • ...and 6 more figures