Robust and Imperceptible Black-box DNN Watermarking Based on Fourier Perturbation Analysis and Frequency Sensitivity Clustering
Yong Liu, Hanzhou Wu, Xinpeng Zhang
TL;DR
This work addresses IP protection for DNNs under black-box access by embedding trigger-based watermarks in the frequency domain. It introduces Fourier perturbation analysis to identify mid-low frequency components that balance imperceptibility and robustness, and a frequency sensitivity clustering step to select perturbable frequencies in a model-aware manner. Watermark embedding proceeds by retraining a marked model with trigger samples labeled as a new class, while ownership verification uses trigger-set predictions under thresholding and considers attacks on the trigger data. Experimental results across CIFAR-10, CIFAR-100, and GTSRB show high task fidelity, high trigger accuracy, and strong robustness against common attacks, outperforming prior methods in imperceptibility and resilience.
Abstract
Recently, more and more attention has been focused on the intellectual property protection of deep neural networks (DNNs), promoting DNN watermarking to become a hot research topic. Compared with embedding watermarks directly into DNN parameters, inserting trigger-set watermarks enables us to verify the ownership without knowing the internal details of the DNN, which is more suitable for application scenarios. The cost is we have to carefully craft the trigger samples. Mainstream methods construct the trigger samples by inserting a noticeable pattern to the clean samples in the spatial domain, which does not consider sample imperceptibility, sample robustness and model robustness, and therefore has limited the watermarking performance and the model generalization. It has motivated the authors in this paper to propose a novel DNN watermarking method based on Fourier perturbation analysis and frequency sensitivity clustering. First, we analyze the perturbation impact of different frequency components of the input sample on the task functionality of the DNN by applying random perturbation. Then, by K-means clustering, we determine the frequency components that result in superior watermarking performance for crafting the trigger samples. Our experiments show that the proposed work not only maintains the performance of the DNN on its original task, but also provides better watermarking performance compared with related works.
