AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

Saeed Ranjbar Alvar; Mohammad Akbari; David Ming Xuan Yue; Yong Zhang

AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

Saeed Ranjbar Alvar, Mohammad Akbari, David Ming Xuan Yue, Yong Zhang

TL;DR

AMUSE introduces an adaptive multi-segment encoding scheme for dataset watermarking that maps a watermark into shorter sub-messages distributed across dataset samples. The encoder selects parameters N and K to balance robustness against subset attacks with the required protection level, while a plug-and-play decoder reconstructs the original watermark from sub-messages. Across multiple off-the-shelf image watermarking methods, AMUSE improves watermark extraction accuracy and watermarked image quality (PSNR) under attacks and demonstrates robustness to subset leakage. The approach is applicable beyond images, with potential extension to other modalities such as text and video, offering a scalable ownership protection mechanism for large datasets.

Abstract

Curating high quality datasets that play a key role in the emergence of new AI applications requires considerable time, money, and computational resources. So, effective ownership protection of datasets is becoming critical. Recently, to protect the ownership of an image dataset, imperceptible watermarking techniques are used to store ownership information (i.e., watermark) into the individual image samples. Embedding the entire watermark into all samples leads to significant redundancy in the embedded information which damages the watermarked dataset quality and extraction accuracy. In this paper, a multi-segment encoding-decoding method for dataset watermarking (called AMUSE) is proposed to adaptively map the original watermark into a set of shorter sub-messages and vice versa. Our message encoder is an adaptive method that adjusts the length of the sub-messages according to the protection requirements for the target dataset. Existing image watermarking methods are then employed to embed the sub-messages into the original images in the dataset and also to extract them from the watermarked images. Our decoder is then used to reconstruct the original message from the extracted sub-messages. The proposed encoder and decoder are plug-and-play modules that can easily be added to any watermarking method. To this end, extensive experiments are preformed with multiple watermarking solutions which show that applying AMUSE improves the overall message extraction accuracy upto 28% for the same given dataset quality. Furthermore, the image dataset quality is enhanced by a PSNR of $\approx$2 dB on average, while improving the extraction accuracy for one of the tested image watermarking methods.

AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

TL;DR

Abstract

2 dB on average, while improving the extraction accuracy for one of the tested image watermarking methods.

Paper Structure (22 sections, 4 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 4 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Related Works
Problem Definition
Proposed Method
Adaptive Multi-Segment Encoder
$N$ and $K$ selection:
Dataset Watermarking using AMUSE
AMUSE Decoder
Experimental Results
Experimental Settings
Metrics
Does $L$ matter in dataset watermarking?
Does applying AMUSE help?
Is AMUSE plug-and-play?
Is AMUSE robust to subset attack?
...and 7 more sections

Figures (10)

Figure 1: The overview of message encoding-decoding for dataset watermarking.
Figure 2: An example for encoding a 300-bit message with $N=3$, $K=1$, and $n=6$. The length of the obtained sub-messages is 202 bits.
Figure 3: Extraction bit (top) and word (bottom) accuracy vs. message length for HiDDeN-based dataset watermarking.
Figure 4: Average bit and word accuracy (with attacks) vs. message length for the SSL-based dataset watermarking.
Figure 5: The average bit (left) and word (right) accuracy for SSL+AMUSE compared to the baseline for given PSNR values with $L=100$ (top), $L=200$ (middle), and $L=300$ (bottom).
...and 5 more figures

AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

TL;DR

Abstract

AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

Authors

TL;DR

Abstract

Table of Contents

Figures (10)