Table of Contents
Fetching ...

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

Li Zhang, Shruti Agarwal, John Collomosse, Pengtao Xie, Vishal Asnani

TL;DR

TokenTrace is introduced, a novel proactive watermarking framework for robust, multi-concept attribution that achieves state-of-the-art performance on both single-concept and multi-concept attribution tasks, significantly outperforming existing baselines while maintaining high visual quality and robustness to common transformations.

Abstract

Generative AI models pose a significant challenge to intellectual property (IP), as they can replicate unique artistic styles and concepts without attribution. While watermarking offers a potential solution, existing methods often fail in complex scenarios where multiple concepts (e.g., an object and an artistic style) are composed within a single image. These methods struggle to disentangle and attribute each concept individually. In this work, we introduce TokenTrace, a novel proactive watermarking framework for robust, multi-concept attribution. Our method embeds secret signatures into the semantic domain by simultaneously perturbing the text prompt embedding and the initial latent noise that guide the diffusion model's generation process. For retrieval, we propose a query-based TokenTrace module that takes the generated image and a textual query specifying which concepts need to be retrieved (e.g., a specific object or style) as inputs. This query-based mechanism allows the module to disentangle and independently verify the presence of multiple concepts from a single generated image. Extensive experiments show that our method achieves state-of-the-art performance on both single-concept (object and style) and multi-concept attribution tasks, significantly outperforming existing baselines while maintaining high visual quality and robustness to common transformations.

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

TL;DR

TokenTrace is introduced, a novel proactive watermarking framework for robust, multi-concept attribution that achieves state-of-the-art performance on both single-concept and multi-concept attribution tasks, significantly outperforming existing baselines while maintaining high visual quality and robustness to common transformations.

Abstract

Generative AI models pose a significant challenge to intellectual property (IP), as they can replicate unique artistic styles and concepts without attribution. While watermarking offers a potential solution, existing methods often fail in complex scenarios where multiple concepts (e.g., an object and an artistic style) are composed within a single image. These methods struggle to disentangle and attribute each concept individually. In this work, we introduce TokenTrace, a novel proactive watermarking framework for robust, multi-concept attribution. Our method embeds secret signatures into the semantic domain by simultaneously perturbing the text prompt embedding and the initial latent noise that guide the diffusion model's generation process. For retrieval, we propose a query-based TokenTrace module that takes the generated image and a textual query specifying which concepts need to be retrieved (e.g., a specific object or style) as inputs. This query-based mechanism allows the module to disentangle and independently verify the presence of multiple concepts from a single generated image. Extensive experiments show that our method achieves state-of-the-art performance on both single-concept (object and style) and multi-concept attribution tasks, significantly outperforming existing baselines while maintaining high visual quality and robustness to common transformations.
Paper Structure (35 sections, 8 equations, 16 figures, 5 tables, 3 algorithms)

This paper contains 35 sections, 8 equations, 16 figures, 5 tables, 3 algorithms.

Figures (16)

  • Figure 1: t-SNE visualization of predicted concept embeddings. The embeddings retrieved by TokenTrace module from DreamBooth-generated images (dots) form distinct, well-separated clusters around their ground-truth embeddings (stars), validating its ability to perform concept attribution.
  • Figure 2: Overview of TokenTrace. (a) Concept encoding: A concept secret is fed into a concept encoder to perturb the targeted concept token and a secret mapper to perturb the initial noise, enabling a dual-conditioning of the latent diffusion model. (b) Concept decoding: The generated image and a query prompt are fed into the TokenTrace module to predict a concept embedding, which a secret decoder then translates back into the original concept secret for verification. (c) TokenTrace module: We use frozen image and text encoders, whose features are fused by trainable projection and attention layers to predict the final concept embedding.
  • Figure 3: Qualitative analysis of visual fidelity for watermarked images. (a) Results on abstract artistic style concepts from the WikiArt dataset. (b) Results on diverse object concepts from the ImageNet dataset. In both cases, the "Watermarked" images are visually indistinguishable from the "Clean" originals.
  • Figure 4: Qualitative example of multi-customized concept prediction. We generate two images using a single prompt containing multiple watermarked concepts, and report the average bit accuracy and average attribution accuracy for each prompt.
  • Figure 5: Qualitative example of multi-general concept prediction. We generate four images using a single prompt containing multiple watermarked concepts, and report the average bit accuracy and average attribution accuracy.
  • ...and 11 more figures