HMARK: Radioactive Multi-Bit Semantic-Latent Watermarking for Diffusion Models
Kexin Li, Guozhen Ding, Ilya Grishchenko, David Lie
TL;DR
HMark addresses unauthorized data use in diffusion-model training by embedding multi-bit, semantic-latent watermarks directly in the h-space of reverse diffusion. An autoencoder learns latent residuals that encode secret bits and injects them at the final timestep, yielding a radioactive watermark that persists through finetuning while maintaining high perceptual fidelity. The approach achieves near-perfect watermark detection (≈98–99%), high bit-recovery accuracy (≈87–98%), and robust performance under common distortions, with optional BCH error-correction to bolster bit recovery for larger bit-lengths. These results demonstrate a practical, verifiable method for dataset ownership protection and IP tracing in generative AI pipelines. The work also discusses limitations and situates itself within the broader landscape of watermarking, cloaking, and data-poisoning defenses.
Abstract
Modern generative diffusion models rely on vast training datasets, often including images with uncertain ownership or usage rights. Radioactive watermarks -- marks that transfer to a model's outputs -- can help detect when such unauthorized data has been used for training. Moreover, aside from being radioactive, an effective watermark for protecting images from unauthorized training also needs to meet other existing requirements, such as imperceptibility, robustness, and multi-bit capacity. To overcome these challenges, we propose HMARK, a novel multi-bit watermarking scheme, which encodes ownership information as secret bits in the semantic-latent space (h-space) for image diffusion models. By leveraging the interpretability and semantic significance of h-space, ensuring that watermark signals correspond to meaningful semantic attributes, the watermarks embedded by HMARK exhibit radioactivity, robustness to distortions, and minimal impact on perceptual quality. Experimental results demonstrate that HMARK achieves 98.57% watermark detection accuracy, 95.07% bit-level recovery accuracy, 100% recall rate, and 1.0 AUC on images produced by the downstream adversarial model finetuned with LoRA on watermarked data across various types of distortions.
