Dynamic Texture Transfer using PatchMatch and Transformers
Guo Pu, Shiyao Xu, Xixin Cao, Zhouhui Lian
TL;DR
The paper tackles one-shot dynamic texture transfer, aiming to animate a target image with textures from a source video while preserving structure. It proposes DynTexture, a two-stage approach: first, initial frame synthesis via distance-map guided PatchMatch; second, subsequent frame prediction using patch-cutting, VQ-VAE encoding, and Transformer-based long-sequence prediction, with Gaussian patch merging. Key contributions include distance-map guidance for PatchMatch, a patch-based, structure-agnostic sequence forecasting pipeline, and efficient discrete latent modeling via VQ-VAE coupled with Transformer prediction to achieve temporally coherent frames. The approach yields superior results on dynamic text effects transfer and moving texture scenarios, and its modular design enables broader applicability to layout modification and image animation tasks.
Abstract
How to automatically transfer the dynamic texture of a given video to the target still image is a challenging and ongoing problem. In this paper, we propose to handle this task via a simple yet effective model that utilizes both PatchMatch and Transformers. The key idea is to decompose the task of dynamic texture transfer into two stages, where the start frame of the target video with the desired dynamic texture is synthesized in the first stage via a distance map guided texture transfer module based on the PatchMatch algorithm. Then, in the second stage, the synthesized image is decomposed into structure-agnostic patches, according to which their corresponding subsequent patches can be predicted by exploiting the powerful capability of Transformers equipped with VQ-VAE for processing long discrete sequences. After getting all those patches, we apply a Gaussian weighted average merging strategy to smoothly assemble them into each frame of the target stylized video. Experimental results demonstrate the effectiveness and superiority of the proposed method in dynamic texture transfer compared to the state of the art.
