SEAL: Entangled White-box Watermarks on Low-Rank Adaptation
Giyeong Oh, Saejin Kim, Woohyun Cho, Sangkyu Lee, Jiwan Chung, Dokyung Song, Youngjae Yu
TL;DR
SEAL presents a universal white-box watermarking scheme for LoRA by inserting a non-trainable passport matrix $C$ between LoRA's trainable blocks $B$ and $A$, entangling the watermark with the adaptation during training and subsequently decomposing $C$ into two factors to hide the passport in public weights. The approach achieves robust ownership verification without degrading host performance and resists removal, obfuscation, and ambiguity attacks through a distributed, multi-passport framework and entanglement in both forward and backward passes. Empirical results across commonsense reasoning, textual and visual instruction tuning, and text-to-image synthesis demonstrate fidelity comparable to standard LoRA, while maintaining strong robustness under pruning and finetuning attacks and resilience to structural obfuscation. SEAL further accommodates LoRA variants and generalized low-rank operators, suggesting broad applicability to PEFT methods with minimal overhead. The work highlights practical IP protection for open-source LoRA weights, enabling verifiable ownership without sacrificing performance, and points to future extensions to additional operators and broader modular watermarking strategies.
Abstract
Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especially through watermark-based techniques, remains underexplored. To address this gap, we propose SEAL (SEcure wAtermarking on LoRA weights), the universal whitebox watermarking for LoRA. SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles the passport with the LoRA weights through training, without extra loss for entanglement, and distributes the finetuned weights after hiding the passport. When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks. We demonstrate that SEAL is robust against a variety of known attacks: removal, obfuscation, and ambiguity attacks.
