Table of Contents
Fetching ...

DICE: Disentangling Artist Style from Content via Contrastive Subspace Decomposition in Diffusion Models

Tong Zhang, Ru Zhang, Jianyi Liu

TL;DR

DICE tackles the risk of unauthorized artist-style mimicry in diffusion-generated art by enabling training-free, inference-time style erasure. The method constructs contrastive triplets to disentangle artist style from content and casts the problem as a generalized eigenvalue optimization to identify a style subspace, subsequently edited via an adaptive, token-wise attention strategy. By decoupling edits across Q, K, and V and employing an adaptive erasure controller, DICE achieves thorough style removal while preserving content with minimal artifacts, incurring only a modest overhead. This approach offers practical deployment benefits, enabling on-the-fly protection against new styles without per-style retraining or explicit replacement styles.

Abstract

The recent proliferation of diffusion models has made style mimicry effortless, enabling users to imitate unique artistic styles without authorization. In deployed platforms, this raises copyright and intellectual-property risks and calls for reliable protection. However, existing countermeasures either require costly weight editing as new styles emerge or rely on an explicitly specified editing style, limiting their practicality for deployment-side safety. To address this challenge, we propose DICE (Disentanglement of artist Style from Content via Contrastive Subspace Decomposition), a training-free framework for on-the-fly artist style erasure. Unlike style editing that require an explicitly specified replacement style, DICE performs style purification, removing the artist's characteristics while preserving the user-intended content. Our core insight is that a model cannot truly comprehend the artist style from a single text or image alone. Consequently, we abandon the traditional paradigm of identifying style from isolated samples. Instead, we construct contrastive triplets to compel the model to distinguish between style and non-style features in the latent space. By formalizing this disentanglement process as a solvable generalized eigenvalue problem, we achieve precise identification of the style subspace. Furthermore, we introduce an Adaptive Attention Decoupling Editing strategy dynamically assesses the style concentration of each token and performs differential suppression and content enhancement on the QKV vectors. Extensive experiments demonstrate that DICE achieves a superior balance between the thoroughness of style erasure and the preservation of content integrity. DICE introduces an additional overhead of only 3 seconds to disentangle style, providing a practical and efficient technique for curbing style mimicry.

DICE: Disentangling Artist Style from Content via Contrastive Subspace Decomposition in Diffusion Models

TL;DR

DICE tackles the risk of unauthorized artist-style mimicry in diffusion-generated art by enabling training-free, inference-time style erasure. The method constructs contrastive triplets to disentangle artist style from content and casts the problem as a generalized eigenvalue optimization to identify a style subspace, subsequently edited via an adaptive, token-wise attention strategy. By decoupling edits across Q, K, and V and employing an adaptive erasure controller, DICE achieves thorough style removal while preserving content with minimal artifacts, incurring only a modest overhead. This approach offers practical deployment benefits, enabling on-the-fly protection against new styles without per-style retraining or explicit replacement styles.

Abstract

The recent proliferation of diffusion models has made style mimicry effortless, enabling users to imitate unique artistic styles without authorization. In deployed platforms, this raises copyright and intellectual-property risks and calls for reliable protection. However, existing countermeasures either require costly weight editing as new styles emerge or rely on an explicitly specified editing style, limiting their practicality for deployment-side safety. To address this challenge, we propose DICE (Disentanglement of artist Style from Content via Contrastive Subspace Decomposition), a training-free framework for on-the-fly artist style erasure. Unlike style editing that require an explicitly specified replacement style, DICE performs style purification, removing the artist's characteristics while preserving the user-intended content. Our core insight is that a model cannot truly comprehend the artist style from a single text or image alone. Consequently, we abandon the traditional paradigm of identifying style from isolated samples. Instead, we construct contrastive triplets to compel the model to distinguish between style and non-style features in the latent space. By formalizing this disentanglement process as a solvable generalized eigenvalue problem, we achieve precise identification of the style subspace. Furthermore, we introduce an Adaptive Attention Decoupling Editing strategy dynamically assesses the style concentration of each token and performs differential suppression and content enhancement on the QKV vectors. Extensive experiments demonstrate that DICE achieves a superior balance between the thoroughness of style erasure and the preservation of content integrity. DICE introduces an additional overhead of only 3 seconds to disentangle style, providing a practical and efficient technique for curbing style mimicry.
Paper Structure (40 sections, 38 equations, 19 figures, 7 tables)

This paper contains 40 sections, 38 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: The style subspace identification in DICE. (a) We construct contrastive triplets and feed their features into a Contrastive Canonical Correlation Analysis (CCA), which formalizes style-content disentanglement as a generalized eigenvalue problem to find the style direction u*. (b) The Token Alignment mechanism corrects spatial misalignment before CCA.
  • Figure 2: Overall architecture of the DICE framework. We solve a generalized eigenvalue problem on features from contrastive triplets to compute the style subspace $U_{style}$. During inference, the pre-computed $U_{style}$ is used to perform Orthogonal Suppression on U-Net features, adaptively removing style components at each denoising step while preserving content.
  • Figure 3: Qualitative comparison for erasing the "Van Gogh","Adrian Ghenie","Chuck Close" artistic style. Our method demonstrates superior performance by completely removing stylistic features (e.g., swirling brushstrokes and specific color palettes) while best preserving the content structure. In contrast, other baseline methods exhibit issues such as residual style , severe content degradation, or the introduction of irrelevant artifacts.
  • Figure 4: Quantitative CLIP Score evaluation for the ablation study of the AEC. The left sub-figure reports $cs_{style}$ computed on five style-only prompt templates (lower is better). The right sub-figure reports $cs_{content}$ computed on five content-only prompt templates, measuring content retention (higher is better).
  • Figure 5: Qualitative ablation of the Adaptive Erasure Controller. The experiment shows that the combination of the fusion calculation of $Q, K, V$ and nonlinear control is crucial for eliminating the style while preserving the content.
  • ...and 14 more figures