TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers

Hengyuan Xu; Liyao Xiang; Borui Yang; Xingjun Ma; Siheng Chen; Baochun Li

TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers

Hengyuan Xu, Liyao Xiang, Borui Yang, Xingjun Ma, Siheng Chen, Baochun Li

TL;DR

TokenMark addresses the lack of modality-agnostic watermarking for pre-trained transformers by leveraging permutation equivariance to embed a secondary weight set that is activated by permuted inputs. This structure-driven approach tightly intertwines watermarking with the model's weights, enabling robust extraction while preserving the main functionality. Extensive experiments across CV and NLP backbones show TokenMark achieves near-perfect watermark extraction, maintains fidelity, and resists fine-tuning, pruning, quantization, and extraction attacks. The method positions TokenMark as a universal plugin to existing watermarking schemes, offering scalable IP protection for multi-modal, pre-trained models.

Abstract

Watermarking is a critical tool for model ownership verification. However, existing watermarking techniques are often designed for specific data modalities and downstream tasks, without considering the inherent architectural properties of the model. This lack of generality and robustness underscores the need for a more versatile watermarking approach. In this work, we investigate the properties of Transformer models and propose TokenMark, a modality-agnostic, robust watermarking system for pre-trained models, leveraging the permutation equivariance property. TokenMark embeds the watermark by fine-tuning the pre-trained model on a set of specifically permuted data samples, resulting in a watermarked model that contains two distinct sets of weights -- one for normal functionality and the other for watermark extraction, the latter triggered only by permuted inputs. Extensive experiments on state-of-the-art pre-trained models demonstrate that TokenMark significantly improves the robustness, efficiency, and universality of model watermarking, highlighting its potential as a unified watermarking solution.

TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers

TL;DR

Abstract

Paper Structure (30 sections, 2 theorems, 18 equations, 8 figures, 9 tables, 2 algorithms)

This paper contains 30 sections, 2 theorems, 18 equations, 8 figures, 9 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
Watermarking Requirements
Backdoor-based Watermarking
Permutation Properties of Transformers
Use Case and Threat Model
Methodology
Permutation Equivariance
Design of TokenMark
Backdoor Watermarking: TokenMark-B
SSL Watermarking: TokenMark-S
Discussion
Experiments
Setup
...and 15 more sections

Key Result

Theorem 5.1

Transformer backbone $F(\cdot)$ is permutation-equivariant in the forward propagation, i.e., where $\theta,P(\theta)$ denotes the original and permuted model weights, respectively.

Figures (8)

Figure 1: TokenMark is a modality-agnostic, robust, and lightweighted watermark for pre-trained models. It has a wider application range, and could serve as a universal plugin to replace the trigger in backdoor-based watermarking systems to enhance robustness against various removal attacks.
Figure 2: Robustness against fine-tuning attacks to backdoor-based watermarking on representative CNN and Transformer models, e.g., ResNet and ViT. Adi suggests the method of adi2018turning and other setup is provided in Appendix \ref{['sec:app_exp']}.
Figure 3: Comparison between the traditional watermarking scheme and TokenMark.
Figure 4: Loss curves of fine-tuning with original backbone and watermarked backbone. CV models are fine-tuned on Tiny ImageNet and NLP models are fine-tuned on IMDB.
Figure 5: Performance of different watermarking schemes against white-box watermark removal attacks. Regions where TokenMark exceeds the baseline are colored green, otherwise blue. First three columns are fine-tuning attacks using different datasets, and the last two are pruning and quantization attack respectively.
...and 3 more figures

Theorems & Definitions (4)

Theorem 5.1: Forward equivariance
Theorem 5.2: Backward equivariance 2304.07735
Definition 5.3: Fidelity
Definition 5.4: Effectiveness

TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers

TL;DR

Abstract

TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)