ActiveMark: on watermarking of visual foundation models via massive activations
Anna Chistyakova, Mikhail Pautov
TL;DR
ActiveMark addresses IP protection for visual foundation models by embedding binary watermarks into expressive hidden representations using a lightweight encoder/decoder and limited fine-tuning. The method leverages massive activations to identify suitable embedding blocks, and it formalizes a loss, evaluation metrics, and thresholding to guarantee reliable detection while resisting false positives/negatives under downstream finetuning and pruning. Empirical results on CLIP and DINOv2 demonstrate high detection rates for watermarked models and robustness to downstream tasks, outperforming baseline watermarking methods. The approach is model-agnostic and practical, with reproducibility resources and theoretical bounds supporting its applicability to a range of VFMs and architectures.
Abstract
Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.
