MOLM: Mixture of LoRA Markers
Samar Fares, Nurbek Tastan, Noor Hussein, Karthik Nandakumar
TL;DR
The paper tackles authenticity and attribution of synthetically generated images from diffusion models by framing watermarking as key-dependent perturbations to frozen model parameters. It introduces Mixture of LoRA Markers (MOLM), a routing-based approach that uses binary keys to select lightweight LoRA adapters across blocks, embedding a watermark without retraining the backbone and enabling robust key extraction. Empirical results on Stable Diffusion v1.5 and FLUX show strong key recovery ($>0.98$ bit accuracy), minimal perceptual impact ($ ext{FID}$ degradation $\leq 1.5$), and resilience to distortions, averaging attacks, diffusion-based removal, and white-box adversarial attacks. The method is efficient, scalable, transferable across datasets, and suitable for real-world deployment, with a clear reproducibility plan and extensive ablations guiding capacity and routing choices.
Abstract
Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images and attribute these images to specific sources. While watermarking has emerged as a possible solution, existing methods remain fragile to realistic distortions, susceptible to adaptive removal, and expensive to update when the underlying watermarking key changes. We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. Within this framework, we introduce Mixture of LoRA Markers (MOLM), a routing-based instantiation in which binary keys activate lightweight LoRA adapters inside residual and attention blocks. This design avoids key-specific re-training and achieves the desired properties such as imperceptibility, fidelity, verifiability, and robustness. Experiments on Stable Diffusion and FLUX show that MOLM preserves image quality while achieving robust key recovery against distortions, compression and regeneration, averaging attacks, and black-box adversarial attacks on the extractor.
