Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng, Pan Gao, Peng Zhou, Jie Qin
TL;DR
Puff-Net addresses the challenge of achieving high-quality style transfer with preserved content while maintaining computational efficiency. It uses an encoder-only transformer to fuse pure content and pure style features, which are produced by two specialized feature extractors (an invertible neural network-based content extractor and a lite-transformer-based style extractor). The approach combines perceptual content and style losses with reconstruction/identity losses for the extractors, and leverages content-aware positional encoding to enhance alignment between content and style. Empirical results show Puff-Net achieves competitive stylization quality with lower model capacity and faster inference, enabling more practical on-device or real-time applications while preserving global structure.
Abstract
Style transfer aims to render an image with the artistic features of a style image, while maintaining the original structure. Various methods have been put forward for this task, but some challenges still exist. For instance, it is difficult for CNN-based methods to handle global information and long-range dependencies between input images, for which transformer-based methods have been proposed. Although transformers can better model the relationship between content and style images, they require high-cost hardware and time-consuming inference. To address these issues, we design a novel transformer model that includes only the encoder, thus significantly reducing the computational cost. In addition, we also find that existing style transfer methods may lead to images under-stylied or missing content. In order to achieve better stylization, we design a content feature extractor and a style feature extractor, based on which pure content and style images can be fed to the transformer. Finally, we propose a novel network termed Puff-Net, i.e., pure content and style feature fusion network. Through qualitative and quantitative experiments, we demonstrate the advantages of our model compared to state-of-the-art ones in the literature.
