Pluggable Style Representation Learning for Multi-Style Transfer
Hongda Liu, Longguang Wang, Weijun Guan, Ye Zhang, Yulan Guo
TL;DR
Multi-style transfer often faces a trade-off between broad style coverage and inference efficiency. This work decouples style modeling from transfer by learning compact 16-dim style representations stored in a Style Codebook (SCB) and a Style-aware Multi-style Transfer (SaMST) network that uses pluggable style representations to condition a unified generator. It introduces three style-conditioned components—SConv, SRAdaIN, and SCM—and an incremental training scheme to add new styles without forgetting, achieving over $4\times$ model-size reduction and over $3\times$ per-style speedup while maintaining or improving stylization quality. Quantitative results on standard benchmarks show state-of-the-art ArtFID, content fidelity, and global/local stylization metrics, with qualitative results illustrating sharper details and better content preservation. The approach enables scalable, edge-friendly multi-style transfer, facilitating practical deployment with easy extension to new styles.
Abstract
Due to the high diversity of image styles, the scalability to various styles plays a critical role in real-world applications. To accommodate a large amount of styles, previous multi-style transfer approaches rely on enlarging the model size while arbitrary-style transfer methods utilize heavy backbones. However, the additional computational cost introduced by more model parameters hinders these methods to be deployed on resource-limited devices. To address this challenge, in this paper, we develop a style transfer framework by decoupling the style modeling and transferring. Specifically, for style modeling, we propose a style representation learning scheme to encode the style information into a compact representation. Then, for style transferring, we develop a style-aware multi-style transfer network (SaMST) to adapt to diverse styles using pluggable style representations. In this way, our framework is able to accommodate diverse image styles in the learned style representations without introducing additional overhead during inference, thereby maintaining efficiency. Experiments show that our style representation can extract accurate style information. Moreover, qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance in terms of both accuracy and efficiency. The codes are available in https://github.com/The-Learning-And-Vision-Atelier-LAVA/SaMST.
