HSI: A Holistic Style Injector for Arbitrary Style Transfer
Shuhao Zhang, Hui Kang, Yang Liu, Fang Mei, Hongjuan Li
TL;DR
Attention-based arbitrary style transfer often emphasizes local patterns and incurs quadratic computation. The Holistic Style Injector (HSI) replaces point-wise attention with global style statistics, dynamic dual relations, and a linear, element-wise transfer to achieve cohesive stylization with linear complexity. By extracting four global statistics (mean, variance, skewness, kurtosis) and combining local-content-to-global-style and global-content-to-global-style relations, HSI better preserves content while enriching style fidelity, adapting to semantic similarity. Empirical results on COCO and WikiArt show superior style fidelity and content preservation with real-time performance and robust high-resolution transfer, indicating HSI as a scalable alternative for AST.
Abstract
Attention-based arbitrary style transfer methods have gained significant attention recently due to their impressive ability to synthesize style details. However, the point-wise matching within the attention mechanism may overly focus on local patterns such that neglect the remarkable global features of style images. Additionally, when processing large images, the quadratic complexity of the attention mechanism will bring high computational load. To alleviate above problems, we propose Holistic Style Injector (HSI), a novel attention-style transformation module to deliver artistic expression of target style. Specifically, HSI performs stylization only based on global style representation that is more in line with the characteristics of style transfer, to avoid generating local disharmonious patterns in stylized images. Moreover, we propose a dual relation learning mechanism inside the HSI to dynamically render images by leveraging semantic similarity in content and style, ensuring the stylized images preserve the original content and improve style fidelity. Note that the proposed HSI achieves linear computational complexity because it establishes feature mapping through element-wise multiplication rather than matrix multiplication. Qualitative and quantitative results demonstrate that our method outperforms state-of-the-art approaches in both effectiveness and efficiency.
