Table of Contents
Fetching ...

Functional Invariants to Watermark Large Transformers

Pierre Fernandez, Guillaume Couairon, Teddy Furon, Matthijs Douze

TL;DR

The paper addresses protecting ownership and integrity of large transformer models by introducing a non-blind white-box watermarking method that uses invariance in weights to create functionally equivalent copies carrying a signature without retraining. The core idea is to apply invertible, composition-friendly weight transformations (e.g., dimension permutations, QK-product inverses, and scaling) to encode a binary watermark across layers, ensuring outputs remain unchanged. Watermarks are encoded as $m$ chunks of $k$ bits by selecting among $2^k$ invariants per level, with extraction based on minimizing the Frobenius distance (MSE) to candidate invariants and a p-value given by $\mathrm{p-value}(s) = 1- \left(1-\mathcal{I}_{ 1/2^{k} } ( m-s, s+1) \right)^N$ to assess matches. Experiments on large transformers (e.g., LLaMA-family) demonstrate robustness against fine-tuning, quantization, and pruning, with minimal impact on next-token prediction utility and CPU-friendly extraction. The approach is limited to white-box scenarios and could be vulnerable if all invariants are discovered, but it establishes a practical, scalable direction for watermarking via parameter redundancy in very large networks.

Abstract

The rapid growth of transformer-based models increases the concerns about their integrity and ownership insurance. Watermarking addresses this issue by embedding a unique identifier into the model, while preserving its performance. However, most existing approaches require to optimize the weights to imprint the watermark signal, which is not suitable at scale due to the computational cost. This paper explores watermarks with virtually no computational cost, applicable to a non-blind white-box setting (assuming access to both the original and watermarked networks). They generate functionally equivalent copies by leveraging the models' invariance, via operations like dimension permutations or scaling/unscaling. This enables to watermark models without any change in their outputs and remains stealthy. Experiments demonstrate the effectiveness of the approach and its robustness against various model transformations (fine-tuning, quantization, pruning), making it a practical solution to protect the integrity of large models.

Functional Invariants to Watermark Large Transformers

TL;DR

The paper addresses protecting ownership and integrity of large transformer models by introducing a non-blind white-box watermarking method that uses invariance in weights to create functionally equivalent copies carrying a signature without retraining. The core idea is to apply invertible, composition-friendly weight transformations (e.g., dimension permutations, QK-product inverses, and scaling) to encode a binary watermark across layers, ensuring outputs remain unchanged. Watermarks are encoded as chunks of bits by selecting among invariants per level, with extraction based on minimizing the Frobenius distance (MSE) to candidate invariants and a p-value given by to assess matches. Experiments on large transformers (e.g., LLaMA-family) demonstrate robustness against fine-tuning, quantization, and pruning, with minimal impact on next-token prediction utility and CPU-friendly extraction. The approach is limited to white-box scenarios and could be vulnerable if all invariants are discovered, but it establishes a practical, scalable direction for watermarking via parameter redundancy in very large networks.

Abstract

The rapid growth of transformer-based models increases the concerns about their integrity and ownership insurance. Watermarking addresses this issue by embedding a unique identifier into the model, while preserving its performance. However, most existing approaches require to optimize the weights to imprint the watermark signal, which is not suitable at scale due to the computational cost. This paper explores watermarks with virtually no computational cost, applicable to a non-blind white-box setting (assuming access to both the original and watermarked networks). They generate functionally equivalent copies by leveraging the models' invariance, via operations like dimension permutations or scaling/unscaling. This enables to watermark models without any change in their outputs and remains stealthy. Experiments demonstrate the effectiveness of the approach and its robustness against various model transformations (fine-tuning, quantization, pruning), making it a practical solution to protect the integrity of large models.
Paper Structure (11 sections, 8 equations, 2 figures, 2 tables)

This paper contains 11 sections, 8 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview. We identify each model by applying invariance operations to the original weights.
  • Figure 2: Detailed illustration of watermark insertion and extraction, with the example of permutation on $L$=40 blocks. A user ID is a list $b_1...b_L$ of $L$ bytes, that are used to select the permutation to apply for each block $\ell$. For each $\ell$, the extraction computes the MSE between the observed weights and all original permuted weights. It then selects the one with minimum MSE, which in turn gives $b_\ell$.