ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Dar-Yen Chen; Hamish Tennent; Ching-Wen Hsu

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu

Abstract

This work introduces ArtAdapter, a transformative text-to-image (T2I) style transfer framework that transcends traditional limitations of color, brushstrokes, and object shape, capturing high-level style elements such as composition and distinctive artistic expression. The integration of a multi-level style encoder with our proposed explicit adaptation mechanism enables ArtAdapter to achieve unprecedented fidelity in style transfer, ensuring close alignment with textual descriptions. Additionally, the incorporation of an Auxiliary Content Adapter (ACA) effectively separates content from style, alleviating the borrowing of content from style references. Moreover, our novel fast finetuning approach could further enhance zero-shot style representation while mitigating the risk of overfitting. Comprehensive evaluations confirm that ArtAdapter surpasses current state-of-the-art methods.

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Abstract

Paper Structure (23 sections, 3 equations, 16 figures, 3 tables)

This paper contains 23 sections, 3 equations, 16 figures, 3 tables.

Introduction
Related Work
Text-to-Image Synthesis
T2I Personalization
Style Transfer
Approach
Multi-Level Style Encoder
Explicit Adaptation
Auxiliary Content Adapter
Fast Finetuning and Multiple Style References
Style Mixing
Experiments
Qualitative Evaluation
Style Mixing
Comparison with State-of-the-art Methods
...and 8 more sections

Figures (16)

Figure 1: Our framework is capable of capturing faithful style representation, from low-level delicate texture to high-level minimalism composition, in either single or multiple style references, closely adhering to the textual prompts.
Figure 2: Architecture of ArtAdapter. Style embeddings, extracted through a cascade of a pretrained VGG 7486599 followed by the multi-level style encoder, interact with text embeddings in the text encoder. In the cross-attention layers, the Explicit Adaptation exclusively optimizes the style-related projection to align outputs with style references. The Auxiliary Content Adapter provides weak content guidance during training, helping disentangle the content structure in the style reference. Our approach faithfully captures the style features without content semantics.
Figure 3: Qualitative results. This collection of images exhibits the ArtAdapter's capability to present faithful style representation across diverse artworks without compromising on semantics, showcasing the versatility and deep understanding of artistic and textual contexts.
Figure 4: Illustration of Style Mixing. The seamless integrations of the two styles reflect the distinct contributions of style features from different hierarchical levels to the final images and demonstrate the remarkable flexibility of ArtAdapter.
Figure 5: Qualitative comparison on single style reference. Our results showcase ArtAdapter's superior style alignment over other approaches zhang2020castDeng_2022_CVPRZhang_2023_CVPRbetker2023improving. Note that the SD rombach2022high column works as content target images for conventional AST models zhang2020castDeng_2022_CVPR.
...and 11 more figures

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Abstract

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Authors

Abstract

Table of Contents

Figures (16)