Table of Contents
Fetching ...

Pluggable Style Representation Learning for Multi-Style Transfer

Hongda Liu, Longguang Wang, Weijun Guan, Ye Zhang, Yulan Guo

TL;DR

Multi-style transfer often faces a trade-off between broad style coverage and inference efficiency. This work decouples style modeling from transfer by learning compact 16-dim style representations stored in a Style Codebook (SCB) and a Style-aware Multi-style Transfer (SaMST) network that uses pluggable style representations to condition a unified generator. It introduces three style-conditioned components—SConv, SRAdaIN, and SCM—and an incremental training scheme to add new styles without forgetting, achieving over $4\times$ model-size reduction and over $3\times$ per-style speedup while maintaining or improving stylization quality. Quantitative results on standard benchmarks show state-of-the-art ArtFID, content fidelity, and global/local stylization metrics, with qualitative results illustrating sharper details and better content preservation. The approach enables scalable, edge-friendly multi-style transfer, facilitating practical deployment with easy extension to new styles.

Abstract

Due to the high diversity of image styles, the scalability to various styles plays a critical role in real-world applications. To accommodate a large amount of styles, previous multi-style transfer approaches rely on enlarging the model size while arbitrary-style transfer methods utilize heavy backbones. However, the additional computational cost introduced by more model parameters hinders these methods to be deployed on resource-limited devices. To address this challenge, in this paper, we develop a style transfer framework by decoupling the style modeling and transferring. Specifically, for style modeling, we propose a style representation learning scheme to encode the style information into a compact representation. Then, for style transferring, we develop a style-aware multi-style transfer network (SaMST) to adapt to diverse styles using pluggable style representations. In this way, our framework is able to accommodate diverse image styles in the learned style representations without introducing additional overhead during inference, thereby maintaining efficiency. Experiments show that our style representation can extract accurate style information. Moreover, qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance in terms of both accuracy and efficiency. The codes are available in https://github.com/The-Learning-And-Vision-Atelier-LAVA/SaMST.

Pluggable Style Representation Learning for Multi-Style Transfer

TL;DR

Multi-style transfer often faces a trade-off between broad style coverage and inference efficiency. This work decouples style modeling from transfer by learning compact 16-dim style representations stored in a Style Codebook (SCB) and a Style-aware Multi-style Transfer (SaMST) network that uses pluggable style representations to condition a unified generator. It introduces three style-conditioned components—SConv, SRAdaIN, and SCM—and an incremental training scheme to add new styles without forgetting, achieving over model-size reduction and over per-style speedup while maintaining or improving stylization quality. Quantitative results on standard benchmarks show state-of-the-art ArtFID, content fidelity, and global/local stylization metrics, with qualitative results illustrating sharper details and better content preservation. The approach enables scalable, edge-friendly multi-style transfer, facilitating practical deployment with easy extension to new styles.

Abstract

Due to the high diversity of image styles, the scalability to various styles plays a critical role in real-world applications. To accommodate a large amount of styles, previous multi-style transfer approaches rely on enlarging the model size while arbitrary-style transfer methods utilize heavy backbones. However, the additional computational cost introduced by more model parameters hinders these methods to be deployed on resource-limited devices. To address this challenge, in this paper, we develop a style transfer framework by decoupling the style modeling and transferring. Specifically, for style modeling, we propose a style representation learning scheme to encode the style information into a compact representation. Then, for style transferring, we develop a style-aware multi-style transfer network (SaMST) to adapt to diverse styles using pluggable style representations. In this way, our framework is able to accommodate diverse image styles in the learned style representations without introducing additional overhead during inference, thereby maintaining efficiency. Experiments show that our style representation can extract accurate style information. Moreover, qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance in terms of both accuracy and efficiency. The codes are available in https://github.com/The-Learning-And-Vision-Atelier-LAVA/SaMST.

Paper Structure

This paper contains 23 sections, 10 equations, 13 figures, 2 tables, 2 algorithms.

Figures (13)

  • Figure 1: Trade-off between inference time $t$ (ms) and ArtFID wright2022artfid achieved by different methods. The size of a circle represents FLOPs.
  • Figure 2: An 2K stylized sample ($2028\times1440$), rendered in about $0.01$ seconds on a single NVIDIA RTX 3090 GPU. The upper left and down left images are the content and style images, respectively.
  • Figure 3: An overview of our multi-style transfer framework.
  • Figure 4: Visualization results of image details produced by different methods on a 2K image from Flickr2K dataset. The whole content image is shown in Fig. \ref{['teaserfig']}.
  • Figure 5: Qualitative comparison with the state of the art. Please zoom in for best view.
  • ...and 8 more figures