Table of Contents
Fetching ...

Data Watermarking for Sequential Recommender Systems

Sixiao Zhang, Cheng Long, Wei Yuan, Hongxu Chen, Hongzhi Yin

TL;DR

This work defines data watermarking for sequential recommender systems and presents DWRS, with two variants: DWRS-D for dataset watermarking and DWRS-U for user watermarking. It formalizes watermark design around a watermark body $x_{wm}$ and target $y_{wm}$, leveraging the model’s receptive field to ensure memorization while maintaining unnoticeability; DWRS-D uses a structured insertion strategy and selects unpopular items to maximize discriminability, whereas DWRS-U places watermarks within a target user’s sequence by leveraging subsequence popularity to compensate for shorter signals. Across three datasets and five representative models, DWRS achieves high watermark validity with minimal impact on utility and shows robustness against finetuning, distillation, and sequential rule mining, though DWRS-U exhibits more model-dependent variability. The results suggest practical applicability for data owners to claim ownership and deter unauthorized use, with DWRS-D generally offering stronger performance and stealth, and code released for reproducibility.

Abstract

In the era of large foundation models, data has become a crucial component in building high-performance AI systems. As the demand for high-quality and large-scale data continues to rise, data copyright protection is attracting increasing attention. In this work, we explore the problem of data watermarking for sequential recommender systems, where a watermark is embedded into the target dataset and can be detected in models trained on that dataset. We focus on two settings: dataset watermarking, which protects the ownership of the entire dataset, and user watermarking, which safeguards the data of individual users. We present a method named Dataset Watermarking for Recommender Systems (DWRS) to address them. We define the watermark as a sequence of consecutive items inserted into normal users' interaction sequences. We define a Receptive Field (RF) to guide the inserting process to facilitate the memorization of the watermark. Extensive experiments on five representative sequential recommendation models and three benchmark datasets demonstrate the effectiveness of DWRS in protecting data copyright while preserving model utility.

Data Watermarking for Sequential Recommender Systems

TL;DR

This work defines data watermarking for sequential recommender systems and presents DWRS, with two variants: DWRS-D for dataset watermarking and DWRS-U for user watermarking. It formalizes watermark design around a watermark body and target , leveraging the model’s receptive field to ensure memorization while maintaining unnoticeability; DWRS-D uses a structured insertion strategy and selects unpopular items to maximize discriminability, whereas DWRS-U places watermarks within a target user’s sequence by leveraging subsequence popularity to compensate for shorter signals. Across three datasets and five representative models, DWRS achieves high watermark validity with minimal impact on utility and shows robustness against finetuning, distillation, and sequential rule mining, though DWRS-U exhibits more model-dependent variability. The results suggest practical applicability for data owners to claim ownership and deter unauthorized use, with DWRS-D generally offering stronger performance and stealth, and code released for reproducibility.

Abstract

In the era of large foundation models, data has become a crucial component in building high-performance AI systems. As the demand for high-quality and large-scale data continues to rise, data copyright protection is attracting increasing attention. In this work, we explore the problem of data watermarking for sequential recommender systems, where a watermark is embedded into the target dataset and can be detected in models trained on that dataset. We focus on two settings: dataset watermarking, which protects the ownership of the entire dataset, and user watermarking, which safeguards the data of individual users. We present a method named Dataset Watermarking for Recommender Systems (DWRS) to address them. We define the watermark as a sequence of consecutive items inserted into normal users' interaction sequences. We define a Receptive Field (RF) to guide the inserting process to facilitate the memorization of the watermark. Extensive experiments on five representative sequential recommendation models and three benchmark datasets demonstrate the effectiveness of DWRS in protecting data copyright while preserving model utility.

Paper Structure

This paper contains 40 sections, 6 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: An illustration of data watermarking for recommender systems. A watermark sequence (books, headphones, apples) is inserted into a training sequence. The model would memorize the watermark by predicting apples for test queries ended with books and headphones.
  • Figure 2: An illustration of the receptive field. For the target item, the shirt and the hat have large attention values to it, while the headphone and the book contribute little. Therefore, the receptive field of the target item includes its previous item and subsequent item.
  • Figure 3: Attention heatmaps of the last 20 items of ML-1M and Beauty on SASRec and Bert4Rec. X-axis and Y-axis denote the item indices. The intensity of the color represents the magnitude of attention. Each entry shows the attention from X to Y. Padding tokens are inserted at the beginning for sequences shorter than 20.
  • Figure 4: DWRS-D: watermark validity (NDCG).
  • Figure 5: DWRS-D: (a) watermark validity (Recall) of Bert4Rec on ML-1M under different watermark length $l$. (b) watermark validity and model utility (Recall@10) of SASRec on ML-1M after finetuning.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 3.1: DWRS-D watermark structure