Robust and Large-Payload DNN Watermarking via Fixed, Distribution-Optimized, Weights
Benedetta Tondi, Andrea Costanzo, Mauro Barni
TL;DR
The paper tackles robust, high-payload white-box watermarking for deep neural networks by fixing watermarked weights prior to training and freezing them, while learning the remainder of the network. It encodes the watermark with direct-sequence spread-spectrum using a secret key and optimizes the watermarked-weight distribution to minimize divergence from the non-watermarked weights, showing the optimal distribution is Laplace$(0,oldsymbol{})$. Empirically, the approach achieves very large payloads with negligible impact on primary task accuracy and demonstrates strong robustness to pruning, quantization, retraining, and transfer learning, outperforming existing methods in secrecy and scalability. The work provides a practical, theoretically grounded method for DNN watermarking with significant real-world implications for IP protection and model provenance while outlining future directions in defense against informed attackers and potential channel-coding enhancements.
Abstract
The design of an effective multi-bit watermarking algorithm hinges upon finding a good trade-off between the three fundamental requirements forming the watermarking trade-off triangle, namely, robustness against network modifications, payload, and unobtrusiveness, ensuring minimal impact on the performance of the watermarked network. In this paper, we first revisit the nature of the watermarking trade-off triangle for the DNN case, then we exploit our findings to propose a white-box, multi-bit watermarking method achieving very large payload and strong robustness against network modification. In the proposed system, the weights hosting the watermark are set prior to training, making sure that their amplitude is large enough to bear the target payload and survive network modifications, notably retraining, and are left unchanged throughout the training process. The distribution of the weights carrying the watermark is theoretically optimised to ensure the secrecy of the watermark and make sure that the watermarked weights are indistinguishable from the non-watermarked ones. The proposed method can achieve outstanding performance, with no significant impact on network accuracy, including robustness against network modifications, retraining and transfer learning, while ensuring a payload which is out of reach of state of the art methods achieving a lower - or at most comparable - robustness.
