Table of Contents
Fetching ...

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

Giyeong Oh, Saejin Kim, Woohyun Cho, Sangkyu Lee, Jiwan Chung, Dokyung Song, Youngjae Yu

TL;DR

SEAL presents a universal white-box watermarking scheme for LoRA by inserting a non-trainable passport matrix $C$ between LoRA's trainable blocks $B$ and $A$, entangling the watermark with the adaptation during training and subsequently decomposing $C$ into two factors to hide the passport in public weights. The approach achieves robust ownership verification without degrading host performance and resists removal, obfuscation, and ambiguity attacks through a distributed, multi-passport framework and entanglement in both forward and backward passes. Empirical results across commonsense reasoning, textual and visual instruction tuning, and text-to-image synthesis demonstrate fidelity comparable to standard LoRA, while maintaining strong robustness under pruning and finetuning attacks and resilience to structural obfuscation. SEAL further accommodates LoRA variants and generalized low-rank operators, suggesting broad applicability to PEFT methods with minimal overhead. The work highlights practical IP protection for open-source LoRA weights, enabling verifiable ownership without sacrificing performance, and points to future extensions to additional operators and broader modular watermarking strategies.

Abstract

Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especially through watermark-based techniques, remains underexplored. To address this gap, we propose SEAL (SEcure wAtermarking on LoRA weights), the universal whitebox watermarking for LoRA. SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles the passport with the LoRA weights through training, without extra loss for entanglement, and distributes the finetuned weights after hiding the passport. When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks. We demonstrate that SEAL is robust against a variety of known attacks: removal, obfuscation, and ambiguity attacks.

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

TL;DR

SEAL presents a universal white-box watermarking scheme for LoRA by inserting a non-trainable passport matrix between LoRA's trainable blocks and , entangling the watermark with the adaptation during training and subsequently decomposing into two factors to hide the passport in public weights. The approach achieves robust ownership verification without degrading host performance and resists removal, obfuscation, and ambiguity attacks through a distributed, multi-passport framework and entanglement in both forward and backward passes. Empirical results across commonsense reasoning, textual and visual instruction tuning, and text-to-image synthesis demonstrate fidelity comparable to standard LoRA, while maintaining strong robustness under pruning and finetuning attacks and resilience to structural obfuscation. SEAL further accommodates LoRA variants and generalized low-rank operators, suggesting broad applicability to PEFT methods with minimal overhead. The work highlights practical IP protection for open-source LoRA weights, enabling verifiable ownership without sacrificing performance, and points to future extensions to additional operators and broader modular watermarking strategies.

Abstract

Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especially through watermark-based techniques, remains underexplored. To address this gap, we propose SEAL (SEcure wAtermarking on LoRA weights), the universal whitebox watermarking for LoRA. SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles the passport with the LoRA weights through training, without extra loss for entanglement, and distributes the finetuned weights after hiding the passport. When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks. We demonstrate that SEAL is robust against a variety of known attacks: removal, obfuscation, and ambiguity attacks.
Paper Structure (72 sections, 31 equations, 9 figures, 16 tables, 3 algorithms)

This paper contains 72 sections, 31 equations, 9 figures, 16 tables, 3 algorithms.

Figures (9)

  • Figure 1: Overview of SEAL. (1) We begin with LoRA’s weights ${A}$ and ${B}$, plus non-trainable passports ${C}, {C}_p$. (2) During training, ${C}$ and ${C}_p$ are inserted between ${B}$ and ${A}$, forcing the model to rely on them and thus entangling the weights with the passports. (3) Afterward, ${C}$ is factorized via $f({C})=({C}_1,{C}_2)$ and merged into ${B}$ and ${A}$, resulting in standard-looking LoRA weights ${B}'$ and ${A}'$. Meanwhile, ${C}_p$ remains private for ownership verification.
  • Figure 2: Negative log singular value (CDF), collection of top-32 singular values. LoRA (blue) vs. SEAL (orange) across Llama-2, Mistral, and Gemma models.
  • Figure 3: Pruning Attack. The x-axis represents the zeroing ratio of the smallest parameters of $\mathbb{N}(B',A')$ based on their L1 norms, the left y-axis shows the fidelity score on commonsense reasoning tasks, and the right y-axis displays the $-\log(\text{p-value})$ on a log scale. If $-\log(\text{p-value})$ is above 3.3 (i.e., p-value $<5\times 10^{-4}$), detecting the watermark succeeds. The graphs show that as the zeroing ratio increases, the fidelity score decreases. This indicates the watermark remains detectable until 99.9% of the weights are zeroed, which significantly degrades the host task's performance.
  • Figure 4: Ambiguity Attacks. Fidelity score, $M_{T}(\mathbb{N}(A, B, C_{t})$, as average accuracy on Commonsense Reasoning tasks, $T$, with the passport $C_{t}$, which is the inference time passport. The x-axis represents the dissimilarity, $\gamma$, where $C_{t} = (1 - \gamma)C_{p} + \gamma \widetilde{C}_{p\text{-adv}}$. ${C}_{p}$ is the concealed passport, and $\widetilde{C}_{p\text{-adv}}$ is the adversary' matrix. When $\gamma > 0.6$, the difference between fidelity scores significantly drops below the threshold of the verification process, $\epsilon_{T}$, as shown in Table \ref{['tab:passport_perfomance']}.
  • Figure 5: Structural Obfuscation Attack on SEAL weight of Gemma-2B via SVD. The original rank is 32, and the ranks are obfuscated from 31 down to 1.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Definition 3.1: Entanglement
  • Definition 3.2: Decomposition Function
  • Definition 3.4: Verification Process