Table of Contents
Fetching ...

Meta-FC: Meta-Learning with Feature Consistency for Robust and Generalizable Watermarking

Yuheng Li, Weitong Chen, Chengcheng Zhu, Jiale Zhang, Chunpeng Ge, Di Wu, Guodong Long

TL;DR

A novel training strategy that enhances robustness and generalization viaMeta-FC, which randomly sample multiple distortions from the noise pool to construct a meta-training task, while holding out one distortion as a simulated ``unknown''distortion for the meta-testing phase.

Abstract

Deep learning-based watermarking has made remarkable progress in recent years. To achieve robustness against various distortions, current methods commonly adopt a training strategy where a \underline{\textbf{s}}ingle \underline{\textbf{r}}andom \underline{\textbf{d}}istortion (SRD) is chosen as the noise layer in each training batch. However, the SRD strategy treats distortions independently within each batch, neglecting the inherent relationships among different types of distortions and causing optimization conflicts across batches. As a result, the robustness and generalizability of the watermarking model are limited. To address this issue, we propose a novel training strategy that enhances robustness and generalization via \underline{\textbf{meta}}-learning with \underline{\textbf{f}}eature \underline{\textbf{c}}onsistency (Meta-FC). Specifically, we randomly sample multiple distortions from the noise pool to construct a meta-training task, while holding out one distortion as a simulated ``unknown'' distortion for the meta-testing phase. Through meta-learning, the model is encouraged to identify and utilize neurons that exhibit stable activations across different types of distortions, mitigating the optimization conflicts caused by the random sampling of diverse distortions in each batch. To further promote the transformation of stable activations into distortion-invariant representations, we introduce a feature consistency loss that constrains the decoded features of the same image subjected to different distortions to remain consistent. Extensive experiments demonstrate that, compared to the SRD training strategy, Meta-FC improves the robustness and generalization of various watermarking models by an average of 1.59\%, 4.71\%, and 2.38\% under high-intensity, combined, and unknown distortions.

Meta-FC: Meta-Learning with Feature Consistency for Robust and Generalizable Watermarking

TL;DR

A novel training strategy that enhances robustness and generalization viaMeta-FC, which randomly sample multiple distortions from the noise pool to construct a meta-training task, while holding out one distortion as a simulated ``unknown''distortion for the meta-testing phase.

Abstract

Deep learning-based watermarking has made remarkable progress in recent years. To achieve robustness against various distortions, current methods commonly adopt a training strategy where a \underline{\textbf{s}}ingle \underline{\textbf{r}}andom \underline{\textbf{d}}istortion (SRD) is chosen as the noise layer in each training batch. However, the SRD strategy treats distortions independently within each batch, neglecting the inherent relationships among different types of distortions and causing optimization conflicts across batches. As a result, the robustness and generalizability of the watermarking model are limited. To address this issue, we propose a novel training strategy that enhances robustness and generalization via \underline{\textbf{meta}}-learning with \underline{\textbf{f}}eature \underline{\textbf{c}}onsistency (Meta-FC). Specifically, we randomly sample multiple distortions from the noise pool to construct a meta-training task, while holding out one distortion as a simulated ``unknown'' distortion for the meta-testing phase. Through meta-learning, the model is encouraged to identify and utilize neurons that exhibit stable activations across different types of distortions, mitigating the optimization conflicts caused by the random sampling of diverse distortions in each batch. To further promote the transformation of stable activations into distortion-invariant representations, we introduce a feature consistency loss that constrains the decoded features of the same image subjected to different distortions to remain consistent. Extensive experiments demonstrate that, compared to the SRD training strategy, Meta-FC improves the robustness and generalization of various watermarking models by an average of 1.59\%, 4.71\%, and 2.38\% under high-intensity, combined, and unknown distortions.
Paper Structure (22 sections, 6 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 6 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: The difference of the training process between SRD and Meta-FC. (a) The SRD pipeline. (b) The Meta-FC pipeline. In the meta-training phase, the model learns to be robust against known distortions. In the meta-testing phase, the model is evaluated under a simulated "unknown” distortion. Note that no truly unknown distortions are involved during the entire training process.
  • Figure 2: The whole training process of our proposed Meta-FC. First, the main encoder and decoder are used to process images under meta-training distortions, yielding the meta-training loss $\mathcal{L}_{meta\text{-}train}$ (composed of $\mathcal{L}_{w,n}$ and $\mathcal{L}^{tra}_{msg}$) and producing temporary encoder and decoder parameters. Next, these temporary parameters are evaluated on the meta-testing distortions to calculate the meta-testing loss $\mathcal{L}_{meta\text{-}test}$ (composed of $\mathcal{L}^{tes}_{msg}$). Subsequently, the image loss $\mathcal{L}_{img}$ (composed of $\mathcal{L}^{tra}_{img}$ and $\mathcal{L}^{tes}_{img}$) is computed based on the watermarked images generated by the main and the temporary encoders. Finally, the main model parameters are updated by minimizing the total loss $\mathcal{L}_{total}$, which consists of $\mathcal{L}_{meta\text{-}train}$, $\mathcal{L}_{meta\text{-}test}$, and $\mathcal{L}_{img}$.
  • Figure 3: Result of the visual quality of SRD and our method under different models. The first row presents the cover image, followed by the watermarked images in the second row. The third row shows the watermarked images subjected to various distortions. The fourth row illustrates the residuals, which represent the difference between the watermarked and cover images and are magnified by a factor of 5 to enhance visibility. The final two rows report the PSNR(dB) and SSIM values of each model, respectively, where consistent visual quality is maintained across training methods for the same model.