Table of Contents
Fetching ...

ActiveMark: on watermarking of visual foundation models via massive activations

Anna Chistyakova, Mikhail Pautov

TL;DR

ActiveMark addresses IP protection for visual foundation models by embedding binary watermarks into expressive hidden representations using a lightweight encoder/decoder and limited fine-tuning. The method leverages massive activations to identify suitable embedding blocks, and it formalizes a loss, evaluation metrics, and thresholding to guarantee reliable detection while resisting false positives/negatives under downstream finetuning and pruning. Empirical results on CLIP and DINOv2 demonstrate high detection rates for watermarked models and robustness to downstream tasks, outperforming baseline watermarking methods. The approach is model-agnostic and practical, with reproducibility resources and theoretical bounds supporting its applicability to a range of VFMs and architectures.

Abstract

Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.

ActiveMark: on watermarking of visual foundation models via massive activations

TL;DR

ActiveMark addresses IP protection for visual foundation models by embedding binary watermarks into expressive hidden representations using a lightweight encoder/decoder and limited fine-tuning. The method leverages massive activations to identify suitable embedding blocks, and it formalizes a loss, evaluation metrics, and thresholding to guarantee reliable detection while resisting false positives/negatives under downstream finetuning and pruning. Empirical results on CLIP and DINOv2 demonstrate high detection rates for watermarked models and robustness to downstream tasks, outperforming baseline watermarking methods. The approach is model-agnostic and practical, with reproducibility resources and theoretical bounds supporting its applicability to a range of VFMs and architectures.

Abstract

Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.

Paper Structure

This paper contains 20 sections, 14 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Schematic illustration of the proposed method. To embed the watermark, we pass an image $x$ to obtain its latent representation $p(x)$. Then, the first channel of $p(x)$, namely, $p_1(x)$ is concatenated with the watermark $m$ and passed to the encoder that produces the vector $e(\texttt{concat}(p_1(x),m))$. Later, the vector $e$ replaces the first channel of the original internal image representation, $p(x),$ and the updated vector $\tilde{p}(x)$ is passed to the latter part of VFM. To extract the watermark, we use a decoding network $d$ that maps an output of the VFM in the form $u = q(\tilde{p}(x))$ to the binary message $m',$ where $m'_i = \mathds{1}(d(u)_i \ge 1/2).$ Both encoder and decoder are represented by two fully connected layers.
  • Figure 2: The average magnitudes of activations of blocks of the source VFMs. It is noteworthy that starting from a particular block, the magnitudes of activations increase drastically, namely, starting from block $18$ of DINOv2 and from block $12$ of CLIP.
  • Figure 3: Watermark detection rate $R$ from equation \ref{['eq:R']}, averaged over $N=1000$ images used for watermarking. Classification (const) and classification (cosine) experiments were conducted on the E-commerce Product Images dataset, classification (linear) experiments were conducted on the Oxford-IIIT Pet dataset, and a segmentation experiment was conducted on the FoodSeg103 dataset.

Theorems & Definitions (1)

  • Remark