Table of Contents
Fetching ...

HSI: A Holistic Style Injector for Arbitrary Style Transfer

Shuhao Zhang, Hui Kang, Yang Liu, Fang Mei, Hongjuan Li

TL;DR

Attention-based arbitrary style transfer often emphasizes local patterns and incurs quadratic computation. The Holistic Style Injector (HSI) replaces point-wise attention with global style statistics, dynamic dual relations, and a linear, element-wise transfer to achieve cohesive stylization with linear complexity. By extracting four global statistics (mean, variance, skewness, kurtosis) and combining local-content-to-global-style and global-content-to-global-style relations, HSI better preserves content while enriching style fidelity, adapting to semantic similarity. Empirical results on COCO and WikiArt show superior style fidelity and content preservation with real-time performance and robust high-resolution transfer, indicating HSI as a scalable alternative for AST.

Abstract

Attention-based arbitrary style transfer methods have gained significant attention recently due to their impressive ability to synthesize style details. However, the point-wise matching within the attention mechanism may overly focus on local patterns such that neglect the remarkable global features of style images. Additionally, when processing large images, the quadratic complexity of the attention mechanism will bring high computational load. To alleviate above problems, we propose Holistic Style Injector (HSI), a novel attention-style transformation module to deliver artistic expression of target style. Specifically, HSI performs stylization only based on global style representation that is more in line with the characteristics of style transfer, to avoid generating local disharmonious patterns in stylized images. Moreover, we propose a dual relation learning mechanism inside the HSI to dynamically render images by leveraging semantic similarity in content and style, ensuring the stylized images preserve the original content and improve style fidelity. Note that the proposed HSI achieves linear computational complexity because it establishes feature mapping through element-wise multiplication rather than matrix multiplication. Qualitative and quantitative results demonstrate that our method outperforms state-of-the-art approaches in both effectiveness and efficiency.

HSI: A Holistic Style Injector for Arbitrary Style Transfer

TL;DR

Attention-based arbitrary style transfer often emphasizes local patterns and incurs quadratic computation. The Holistic Style Injector (HSI) replaces point-wise attention with global style statistics, dynamic dual relations, and a linear, element-wise transfer to achieve cohesive stylization with linear complexity. By extracting four global statistics (mean, variance, skewness, kurtosis) and combining local-content-to-global-style and global-content-to-global-style relations, HSI better preserves content while enriching style fidelity, adapting to semantic similarity. Empirical results on COCO and WikiArt show superior style fidelity and content preservation with real-time performance and robust high-resolution transfer, indicating HSI as a scalable alternative for AST.

Abstract

Attention-based arbitrary style transfer methods have gained significant attention recently due to their impressive ability to synthesize style details. However, the point-wise matching within the attention mechanism may overly focus on local patterns such that neglect the remarkable global features of style images. Additionally, when processing large images, the quadratic complexity of the attention mechanism will bring high computational load. To alleviate above problems, we propose Holistic Style Injector (HSI), a novel attention-style transformation module to deliver artistic expression of target style. Specifically, HSI performs stylization only based on global style representation that is more in line with the characteristics of style transfer, to avoid generating local disharmonious patterns in stylized images. Moreover, we propose a dual relation learning mechanism inside the HSI to dynamically render images by leveraging semantic similarity in content and style, ensuring the stylized images preserve the original content and improve style fidelity. Note that the proposed HSI achieves linear computational complexity because it establishes feature mapping through element-wise multiplication rather than matrix multiplication. Qualitative and quantitative results demonstrate that our method outperforms state-of-the-art approaches in both effectiveness and efficiency.

Paper Structure

This paper contains 15 sections, 12 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The comparison between our method and some attention-based methods. (a) Image stylization results. Compared to SANet sanet and AdaAttN adaattn, the generated image of our method is more consistent with original content and more harmonious in style patterns. (b) GPU memory consumption in different resolutions. Compared to other methods, our method successfully renders images from 256 $\times$ 256 to 2048 $\times$ 2048 resolution on a 24GB GPU (4090Ti) without running out of memory.
  • Figure 2: The network framework of our method.
  • Figure 3: The structure and feature encoding process comparison with self-attention (a) and our HSI module (b). HSI has a similar structure to self-attention, which uses element-wise multiplication instead of matrix multiplication to model the semantic similarity of content features and style features.
  • Figure 4: The detailed illustration of global style aggregation in solid green box of Figure \ref{['self_att']}(b).
  • Figure 5: Qualitative comparisons with state-of-the-art AST methods. Zoom in for a better view.
  • ...and 4 more figures