LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

Mingkang Zhu; Xi Chen; Zhongdao Wang; Hengshuang Zhao; Jiaya Jia

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia

TL;DR

This paper tackles the challenge of inserting user-provided logos into diffusion models to enable identity-preserving, context-aware generation. It introduces LogoSticker, a two-phase pipeline consisting of (i) an actor-critic relation pre-training phase to learn how logos should appear on diverse objects, and (ii) a decoupled identity learning phase to bind a logo to a token and then distill its identity into the model. The method leverages a CLIP-based critic and specialized data synthesis (logo token binding on solid backgrounds and identity learning on natural scenes) to achieve precise localization and faithful logo reproduction across contexts. Empirical results show LogoSticker outperforms Dreambooth, Textual Inversion, and ReVersion baselines and competitively compares to large models like DALLE-3, with strong performance in both qualitative and quantitative identity fidelity, prompt fidelity, and applicability to inpainting and multi-concept customization. The work offers practical benefits for advertising and branding, enabling robust, logo-aware generation in varied scenes.

Abstract

Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples. Yet, these progresses are largely confined to widely recognized subjects, which can be learned with relative ease through models' adequate shared prior knowledge. In contrast, logos, characterized by unique patterns and textual elements, are hard to establish shared knowledge within diffusion models, thus presenting a unique challenge. To bridge this gap, we introduce the task of logo insertion. Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts. We present a novel two-phase pipeline LogoSticker to tackle this task. First, we propose the actor-critic relation pre-training algorithm, which addresses the nontrivial gaps in models' understanding of the potential spatial positioning of logos and interactions with other objects. Second, we propose a decoupled identity learning algorithm, which enables precise localization and identity extraction of logos. LogoSticker can generate logos accurately and harmoniously in diverse contexts. We comprehensively validate the effectiveness of LogoSticker over customization methods and large models such as DALLE~3. \href{https://mingkangz.github.io/logosticker}{Project page}.

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

TL;DR

Abstract

Paper Structure (12 sections, 2 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 2 equations, 8 figures, 1 table, 1 algorithm.

Introduction
Related Work
Methods
Preliminary
Actor-Critic Relation Pre-training
Decoupled Identity Learning
Experiments
Experimental Setup
Comparisons
More Applications
Ablation Studies
Conclusion

Figures (8)

Figure 1: Given a logo with fine-grained details, our method LogoSticker enables accurate distilling of its identity to diffusion models, thus supporting coherent text-to-image generation in diverse scenarios. It can also be extended to multi-object customization, and logos inpainting on user-given images.
Figure 2: The overall pipeline of our proposed LogoSticker. (1) We first pre-train the text encoder and a token $<$painted$>$ in an actor-critic fashion to learn the relation of logo placement in various contexts effectively. (2) We build the logo token binding set and optimize another special token $<$V$>$ to bind it with the target logo so that the target logo in training images can be localized. Then, we build a more complex logo identity learning set and fine-tune the U-Net to capture the logo identity precisely.
Figure 3: Visualizations of attention maps for: (a) Common synonyms of the word "logo" and our special token optimized on the logo token binding set. (b) Class name tokens of commonplace items in Dreambooth's Ruiz_2023_CVPR dataset. Attention maps are computed by averaging attention activation across time steps and layers.
Figure 4: Qualitative comparisons with baseline customization methods. LogoSticker successfully preserves the logo identity while others struggle. LogoSticker can synthesize the logo coherently on various objects. The logo identity is maintained even on curved objects or under various viewing positions.
Figure 5: Qualitative comparisons with large text-to-image models, including ControlNet controlnet and DALLE 3 dalle3 using both detailed text and image prompts.
...and 3 more figures

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

TL;DR

Abstract

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)