Table of Contents
Fetching ...

AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status

Jinghao Zhang, Wen Qian, Hao Luo, Fan Wang, Feng Zhao

TL;DR

This work presents AnyLogo, a zero-shot region customizer with remarkable detail consistency, building upon the symbiotic diffusion system with eliminated cumbersome designs, and discerns that the rigorous signature extraction and creative content generation are promisingly compatible and can be systematically recycled within a single denoising model.

Abstract

Diffusion models have made compelling progress on facilitating high-throughput daily production. Nevertheless, the appealing customized requirements are remain suffered from instance-level finetuning for authentic fidelity. Prior zero-shot customization works achieve the semantic consistence through the condensed injection of identity features, while addressing detailed low-level signatures through complex model configurations and subject-specific fabrications, which significantly break the statistical coherence within the overall system and limit the applicability across various scenarios. To facilitate the generic signature concentration with rectified efficiency, we present \textbf{AnyLogo}, a zero-shot region customizer with remarkable detail consistency, building upon the symbiotic diffusion system with eliminated cumbersome designs. Streamlined as vanilla image generation, we discern that the rigorous signature extraction and creative content generation are promisingly compatible and can be systematically recycled within a single denoising model. In place of the external configurations, the gemini status of the denoising model promote the reinforced subject transmission efficiency and disentangled semantic-signature space with continuous signature decoration. Moreover, the sparse recycling paradigm is adopted to prevent the duplicated risk with compressed transmission quota for diversified signature stimulation. Extensive experiments on constructed logo-level benchmarks demonstrate the effectiveness and practicability of our methods.

AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status

TL;DR

This work presents AnyLogo, a zero-shot region customizer with remarkable detail consistency, building upon the symbiotic diffusion system with eliminated cumbersome designs, and discerns that the rigorous signature extraction and creative content generation are promisingly compatible and can be systematically recycled within a single denoising model.

Abstract

Diffusion models have made compelling progress on facilitating high-throughput daily production. Nevertheless, the appealing customized requirements are remain suffered from instance-level finetuning for authentic fidelity. Prior zero-shot customization works achieve the semantic consistence through the condensed injection of identity features, while addressing detailed low-level signatures through complex model configurations and subject-specific fabrications, which significantly break the statistical coherence within the overall system and limit the applicability across various scenarios. To facilitate the generic signature concentration with rectified efficiency, we present \textbf{AnyLogo}, a zero-shot region customizer with remarkable detail consistency, building upon the symbiotic diffusion system with eliminated cumbersome designs. Streamlined as vanilla image generation, we discern that the rigorous signature extraction and creative content generation are promisingly compatible and can be systematically recycled within a single denoising model. In place of the external configurations, the gemini status of the denoising model promote the reinforced subject transmission efficiency and disentangled semantic-signature space with continuous signature decoration. Moreover, the sparse recycling paradigm is adopted to prevent the duplicated risk with compressed transmission quota for diversified signature stimulation. Extensive experiments on constructed logo-level benchmarks demonstrate the effectiveness and practicability of our methods.
Paper Structure (21 sections, 7 equations, 13 figures, 5 tables)

This paper contains 21 sections, 7 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The comparison of the overconfigured system and the symbotic system in transmission efficiency. (a) The proportion of the accumulative subject attention (ASA) and the transmitted statistic latent difference (SLD) at four distributed self-attention layers in the denoising model, where the SLD is computed between the transmitted subject latents and the corresponding denoising latents. The symbotic system raises the increasing transmission efficiency with deeper model operations. (b) The comprehensive attention analysis accumulated along the model layers and the denoising steps. Both the customized region (light) and the background area (dark) benefit the boosted subject expertise from the symbotic system. (c) Visual comparison results of two subject-driven diffusion systems. Detailed calculation process and visual illustration of two systems are provided in Appendix \ref{['sec: symbotic']}.
  • Figure 2: Preliminary of the informative latents in diffusion model. (a) Exclude the impact from the public VAE component, where the zoom-in reconstruction show almost spotless fidelity. (b) Without bells and whistles, the inversion technique reproduce the original image with single initial noise, the negligible deviation implying the great signature potential within the denoising model.
  • Figure 3: Overview of AnyLogo, which transports the customized textured subject to the candidate region in the scene image. The gemini status, i.e., signature extaction and content generation are performed alternatively in each denoising step. The overconfigured extractor is discarded with model recycling policy for signature delivery. The transmission quota is compressed during training, and released in inference for preventing the duplicate risk with steered diversified signature representation.
  • Figure 3: Comparison of the overconfigured system and the symbotic system on wild logo customization.
  • Figure 4: The comparison of the signature interpolation between the symbotic system and overconfigured system, where the signature flows are delivered progressively with increasing threshold, ranging from the blocked states to the fully released states during inference. The symbotic system manifests the consistent semantic content and comforting quality with flexible signature decoration.
  • ...and 8 more figures