Table of Contents
Fetching ...

MOLM: Mixture of LoRA Markers

Samar Fares, Nurbek Tastan, Noor Hussein, Karthik Nandakumar

TL;DR

The paper tackles authenticity and attribution of synthetically generated images from diffusion models by framing watermarking as key-dependent perturbations to frozen model parameters. It introduces Mixture of LoRA Markers (MOLM), a routing-based approach that uses binary keys to select lightweight LoRA adapters across blocks, embedding a watermark without retraining the backbone and enabling robust key extraction. Empirical results on Stable Diffusion v1.5 and FLUX show strong key recovery ($>0.98$ bit accuracy), minimal perceptual impact ($ ext{FID}$ degradation $\leq 1.5$), and resilience to distortions, averaging attacks, diffusion-based removal, and white-box adversarial attacks. The method is efficient, scalable, transferable across datasets, and suitable for real-world deployment, with a clear reproducibility plan and extensive ablations guiding capacity and routing choices.

Abstract

Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images and attribute these images to specific sources. While watermarking has emerged as a possible solution, existing methods remain fragile to realistic distortions, susceptible to adaptive removal, and expensive to update when the underlying watermarking key changes. We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. Within this framework, we introduce Mixture of LoRA Markers (MOLM), a routing-based instantiation in which binary keys activate lightweight LoRA adapters inside residual and attention blocks. This design avoids key-specific re-training and achieves the desired properties such as imperceptibility, fidelity, verifiability, and robustness. Experiments on Stable Diffusion and FLUX show that MOLM preserves image quality while achieving robust key recovery against distortions, compression and regeneration, averaging attacks, and black-box adversarial attacks on the extractor.

MOLM: Mixture of LoRA Markers

TL;DR

The paper tackles authenticity and attribution of synthetically generated images from diffusion models by framing watermarking as key-dependent perturbations to frozen model parameters. It introduces Mixture of LoRA Markers (MOLM), a routing-based approach that uses binary keys to select lightweight LoRA adapters across blocks, embedding a watermark without retraining the backbone and enabling robust key extraction. Empirical results on Stable Diffusion v1.5 and FLUX show strong key recovery ( bit accuracy), minimal perceptual impact ( degradation ), and resilience to distortions, averaging attacks, diffusion-based removal, and white-box adversarial attacks. The method is efficient, scalable, transferable across datasets, and suitable for real-world deployment, with a clear reproducibility plan and extensive ablations guiding capacity and routing choices.

Abstract

Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images and attribute these images to specific sources. While watermarking has emerged as a possible solution, existing methods remain fragile to realistic distortions, susceptible to adaptive removal, and expensive to update when the underlying watermarking key changes. We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. Within this framework, we introduce Mixture of LoRA Markers (MOLM), a routing-based instantiation in which binary keys activate lightweight LoRA adapters inside residual and attention blocks. This design avoids key-specific re-training and achieves the desired properties such as imperceptibility, fidelity, verifiability, and robustness. Experiments on Stable Diffusion and FLUX show that MOLM preserves image quality while achieving robust key recovery against distortions, compression and regeneration, averaging attacks, and black-box adversarial attacks on the extractor.

Paper Structure

This paper contains 41 sections, 16 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Proposed watermarking framework. A text prompt $\mathbf{t}$ is processed by both the frozen generator $\mathcal{G}_\Phi$ (producing clean image $\mathbf{x}$) and the perturbed generator $\mathcal{G}_{\Phi + \Delta\Phi(\mathbf{\kappa})}$ (producing watermarked image $\tilde{\mathbf{x}}$), where $\Delta\Phi(\mathbf{\kappa})$ denotes a key-dependent parameter perturbation. During training (left), the extractor $\mathcal{V}_{\eta}$ recovers the embedded key $\tilde{\mathbf{\kappa}}$, optimized with perceptual loss $\mathcal{L}_{\text{imp}}$ and key loss $\mathcal{L}_{\text{ver}}$. At deployment (right), the model owner/verifier verifies and attributes generated images by extracting and matching the embedded key.
  • Figure 2: MOLM generation pipeline. A binary key $\mathbf{\kappa}$ is mapped into a routing collection $\{s_\ell\}_{\ell \in [L]}$ that determines the active LoRA adapters $\{\mathcal{A}_\ell^{(s_\ell)}\}$ across ResNet and Attention blocks of the frozen generator. During the diffusion sampling process, this routing implements the perturbation $\Delta\Phi(\mathbf{\kappa})$, yielding the watermarked image $\tilde{\mathbf{x}} = \mathcal{G}_{\Phi+\Delta\Phi(\mathbf{k})}(\mathbf{t})$ for a given prompt $\mathbf{t}$. The backbone weights $\Phi$ remain frozen, ensuring no added inference cost.
  • Figure 3: Image generation quality. Visual comparison between Stable Diffusion (SD) and MOLM on MS COCO (left four columns) and LAION Aesthetics (right four columns). For each prompt, we show the original Stable Diffusion image and the corresponding watermarked image. MOLM preserves high image quality.
  • Figure 4: Averaging attack evaluation: MOLM vs. WOUAF (same message). (Left) Forgery attack. MOLM stays at the chance level ($\sim0.5$). (Right) Removal attack. MOLM achieves accuracy $\geq 0.96$, whereas WOUAF degrades to $\sim0.85$–$0.90$.
  • Figure 5: Routing the UNet. Comparison between vanilla Stable Diffusion (left), MOLM with decoder-only routing (28 bits, middle), and MOLM with UNet routing (108 bits, right). While capacity increases, routing the UNet introduces visible artifacts and degrades fidelity.
  • ...and 6 more figures