Table of Contents
Fetching ...

Trigger-Based Fragile Model Watermarking for Image Transformation Networks

Preston K. Robinette, Dung T. Nguyen, Samuel Sasaki, Taylor T. Johnson

TL;DR

This work introduces a novel, trigger-based fragile model watermarking system for image transformation/generation networks that takes advantage of properties inherent to image outputs that manifesting watermarks as specific visual patterns, styles, or anomalies in the generated content when particular trigger inputs are used.

Abstract

In fragile watermarking, a sensitive watermark is embedded in an object in a manner such that the watermark breaks upon tampering. This fragile process can be used to ensure the integrity and source of watermarked objects. While fragile watermarking for model integrity has been studied in classification models, image transformation/generation models have yet to be explored. We introduce a novel, trigger-based fragile model watermarking system for image transformation/generation networks that takes advantage of properties inherent to image outputs. For example, manifesting watermarks as specific visual patterns, styles, or anomalies in the generated content when particular trigger inputs are used. Our approach, distinct from robust watermarking, effectively verifies the model's source and integrity across various datasets and attacks, outperforming baselines by 94%. We conduct additional experiments to analyze the security of this approach, the flexibility of the trigger and resulting watermark, and the sensitivity of the watermarking loss on performance. We also demonstrate the applicability of this approach on two different tasks (1 immediate task and 1 downstream task). This is the first work to consider fragile model watermarking for image transformation/generation networks.

Trigger-Based Fragile Model Watermarking for Image Transformation Networks

TL;DR

This work introduces a novel, trigger-based fragile model watermarking system for image transformation/generation networks that takes advantage of properties inherent to image outputs that manifesting watermarks as specific visual patterns, styles, or anomalies in the generated content when particular trigger inputs are used.

Abstract

In fragile watermarking, a sensitive watermark is embedded in an object in a manner such that the watermark breaks upon tampering. This fragile process can be used to ensure the integrity and source of watermarked objects. While fragile watermarking for model integrity has been studied in classification models, image transformation/generation models have yet to be explored. We introduce a novel, trigger-based fragile model watermarking system for image transformation/generation networks that takes advantage of properties inherent to image outputs. For example, manifesting watermarks as specific visual patterns, styles, or anomalies in the generated content when particular trigger inputs are used. Our approach, distinct from robust watermarking, effectively verifies the model's source and integrity across various datasets and attacks, outperforming baselines by 94%. We conduct additional experiments to analyze the security of this approach, the flexibility of the trigger and resulting watermark, and the sensitivity of the watermarking loss on performance. We also demonstrate the applicability of this approach on two different tasks (1 immediate task and 1 downstream task). This is the first work to consider fragile model watermarking for image transformation/generation networks.
Paper Structure (25 sections, 7 equations, 6 figures, 7 tables)

This paper contains 25 sections, 7 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Robust vs. fragile watermarking.
  • Figure 2: Comparison between Image Classification and Image Transformation/Generation Models: The Image Classification Model predicts a label for the entire image (e.g., Class A, B, C, D, E), while the Image Transformation/Generation Model outputs a transformed or generated image.
  • Figure 3: The proposed fragile watermarking diagram for (a) immediate and (b) downstream tasks. For immediate tasks, a clean image is reconstructed normally before and after a perturbation, as shown by $\hat{x}_c$ in the (1) Test Phase and (2) Attack Phase. A trigger image $x_{w_f}$, however, results in a watermark signature before a modification, but is broken following an attack, as indicated by the ✓ for the attack phase in the Watermark Eval. For downstream tasks (b), a clean image $x_{c}$ results in a high fidelity segmentation mask for both phases. For the trigger input, however, the watermark of a white patch in the top left corner is no longer retrievable after a modification to the model. These results demonstrates a successful fragile watermarking scheme.
  • Figure 4: Example triggers and watermarks for (1) immediate tasks and (2) downstream tasks. A patch (a) trigger is a small patch of color in the top left corner. A block (b) trigger is a block of color that spans the size of the image. A noise (c) trigger is a randomly generated image of Gaussian noise. A steganography (d) container is an image with an embedded watermark that is not visible. These triggers inputs are mapped to either an (e) patch, (f) block or (g) arbitrary image. For downstream semantic segmentation tasks, we consider (h) patch and (i) block triggers that map to (j) patch, (k) block, or (l) inverse watermarks. An (l) inverse watermark inverts the ground truth segmentation mask.
  • Figure 5: Fidelity, retrievability, and fragile watermarking results for models fragile watermarked with the proposed approach using a patch-to-block trigger scheme (UNet, LinkNet, FPN, PSPNet, PAN, LRASPP, and Deeplabv3) as well as two baselines. The left-most column shows a clean input $x_{c}$ which should resemble the reconstructed image from a model without the watermark (w/o $W_f$) and the reconstructed image from a model with the watermark (w/ $W_f$). The middle column showcases the reconstructed output from a trigger input. This image should be the green block watermark. The last three columns show the output from a trigger image after the corresponding attacks: ftune1, ftune5, overwrite. All models utilizing the proposed approach meet the fidelity, retrievability, and fragility outcomes. The two baselines, however, are not fragile, as indicated by the persistent watermark after each attack.
  • ...and 1 more figures