Table of Contents
Fetching ...

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, Yue Lu

TL;DR

This work employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits, which ensures that the model’s decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.

Abstract

Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

TL;DR

This work employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits, which ensures that the model’s decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.

Abstract

Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.
Paper Structure (12 sections, 7 equations, 3 figures, 5 tables)

This paper contains 12 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Demonstration of model boundary alterations and positions of sample pairs selected by using the proposed method. The circles represent different samples. Here, 'activate level' indicates the degree of activation logits of the samples near the model boundary, a concept further elaborated in Section 3.B.
  • Figure 2: Overall framework for generating sensitive samples. In stage 0, the model supplier records users and their corresponding keys, and when users need to check the model, a weight matrix is generated using the corresponding keys to add one additional binary classification layer, facilitating user isolation for sensitive samples. In stage 1, under the existing the additional binary classification layer, we use a combined_loss to optimize random pixels with the objective of bringing the sensitive samples close to the model boundary while maximizing neuron activation. Finally, in stage 2, using a method similar to adversarial attacks with a very small learning rate, we cross the model boundary and record the two sensitive samples before and after crossing to form a sample pair.
  • Figure 3: The Success Rate (%) of Detecting Sensitive Samples After Prune the Models.