TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers
Hengyuan Xu, Liyao Xiang, Borui Yang, Xingjun Ma, Siheng Chen, Baochun Li
TL;DR
TokenMark addresses the lack of modality-agnostic watermarking for pre-trained transformers by leveraging permutation equivariance to embed a secondary weight set that is activated by permuted inputs. This structure-driven approach tightly intertwines watermarking with the model's weights, enabling robust extraction while preserving the main functionality. Extensive experiments across CV and NLP backbones show TokenMark achieves near-perfect watermark extraction, maintains fidelity, and resists fine-tuning, pruning, quantization, and extraction attacks. The method positions TokenMark as a universal plugin to existing watermarking schemes, offering scalable IP protection for multi-modal, pre-trained models.
Abstract
Watermarking is a critical tool for model ownership verification. However, existing watermarking techniques are often designed for specific data modalities and downstream tasks, without considering the inherent architectural properties of the model. This lack of generality and robustness underscores the need for a more versatile watermarking approach. In this work, we investigate the properties of Transformer models and propose TokenMark, a modality-agnostic, robust watermarking system for pre-trained models, leveraging the permutation equivariance property. TokenMark embeds the watermark by fine-tuning the pre-trained model on a set of specifically permuted data samples, resulting in a watermarked model that contains two distinct sets of weights -- one for normal functionality and the other for watermark extraction, the latter triggered only by permuted inputs. Extensive experiments on state-of-the-art pre-trained models demonstrate that TokenMark significantly improves the robustness, efficiency, and universality of model watermarking, highlighting its potential as a unified watermarking solution.
