Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

Fangqi Li; Haodong Zhao; Wei Du; Shilin Wang

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

Fangqi Li, Haodong Zhao, Wei Du, Shilin Wang

TL;DR

This work introduces an information-theoretic capacity framework for DNN watermarks, defining $C(\delta,L)$ to capture how much identity information can be reliably transmitted under a tolerated degradation $\delta$. It provides a capacity-estimation algorithm to obtain tight upper bounds $\hat{C}(\delta,L)$ under adversarial overwriting and a universal, non-invasive approach called multiple rounds of ownership verification (MROV) to push beyond single-round limits, with a variational extension (MROV-V) to broaden applicability. The authors validate the framework across multiple watermarking schemes, showing how capacity depends on the fidelity-robustness tradeoff and how MROV and MROV-V can enhance verifiability while controlling performance loss. Overall, the study offers a principled, quantifiable approach for IP protection of DNNs and practical guidance for designing watermarking schemes that balance integrity, robustness, and efficiency.

Abstract

To trace the copyright of deep neural networks, an owner can embed its identity information into its model as a watermark. The capacity of the watermark quantify the maximal volume of information that can be verified from the watermarked model. Current studies on capacity focus on the ownership verification accuracy under ordinary removal attacks and fail to capture the relationship between robustness and fidelity. This paper studies the capacity of deep neural network watermarks from an information theoretical perspective. We propose a new definition of deep neural network watermark capacity analogous to channel capacity, analyze its properties, and design an algorithm that yields a tight estimation of its upper bound under adversarial overwriting. We also propose a universal non-invasive method to secure the transmission of the identity message beyond capacity by multiple rounds of ownership verification. Our observations provide evidence for neural network owners and defenders that are curious about the tradeoff between the integrity of their ownership and the performance degradation of their products.

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

TL;DR

This work introduces an information-theoretic capacity framework for DNN watermarks, defining

to capture how much identity information can be reliably transmitted under a tolerated degradation

. It provides a capacity-estimation algorithm to obtain tight upper bounds

under adversarial overwriting and a universal, non-invasive approach called multiple rounds of ownership verification (MROV) to push beyond single-round limits, with a variational extension (MROV-V) to broaden applicability. The authors validate the framework across multiple watermarking schemes, showing how capacity depends on the fidelity-robustness tradeoff and how MROV and MROV-V can enhance verifiability while controlling performance loss. Overall, the study offers a principled, quantifiable approach for IP protection of DNNs and practical guidance for designing watermarking schemes that balance integrity, robustness, and efficiency.

Abstract

Paper Structure (16 sections, 5 theorems, 28 equations, 11 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 5 theorems, 28 equations, 11 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries
DNN watermark
Information capacity of watermark
Related works
Information Capacity of DNN Watermark
Definition
Capacity estimation
Breaking the capacity bottleneck
Experiments and Discussions
Settings
Capacity estimation
Efficacy of MROV-V
Conclusions
Case Study of Theorem 3 on Classifiers
...and 1 more sections

Key Result

Theorem 1

(Monotonicity)$0\!\leq\! C(\delta,L)\!\leq\! L$. $C(\delta,L)$ decreases in $\delta$. $C(\delta,L)$ increases in $L$ if each bit of the identity message is independently embedded and retrieved.

Figures (11)

Figure 1: The workflow of a DNN watermarking scheme.
Figure 2: The performance degradation v.s. the length of the identity message. The blue area and the red area represent the cost in fidelity and robustness respectively.
Figure 3: The contours denote levels of performance degradation. (a) An adversarial modification. (b) Increasing robustness as a defense. (c) Averaging multiple rounds of ownership verification as a defense. $M_{\text{WM}}\!+\!\theta_{1}$ denotes a failed attack. $M_{\text{WM}}\!+\!\theta_{2}$ denotes a successful attack.
Figure 4: Multiple rounds of ownership verification, conducted by the judge. (a) can only be applied to white-box schemes. (b) can be applied to any scheme.
Figure 5: Estimated capacity $\hat{C}(\delta,L)$ as a function of the performance degradation $\delta$ under fine-tuning (), neuron-pruning (), and adversarial overwriting (). The length of the identity message $L$ is set as 256, 512, 1024, and 2048.
...and 6 more figures

Theorems & Definitions (10)

Theorem 1
proof
Theorem 2
proof
Theorem 3
proof
Theorem 4
proof
Theorem 5
proof

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

TL;DR

Abstract

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (10)