Data Watermarking for Sequential Recommender Systems
Sixiao Zhang, Cheng Long, Wei Yuan, Hongxu Chen, Hongzhi Yin
TL;DR
This work defines data watermarking for sequential recommender systems and presents DWRS, with two variants: DWRS-D for dataset watermarking and DWRS-U for user watermarking. It formalizes watermark design around a watermark body $x_{wm}$ and target $y_{wm}$, leveraging the model’s receptive field to ensure memorization while maintaining unnoticeability; DWRS-D uses a structured insertion strategy and selects unpopular items to maximize discriminability, whereas DWRS-U places watermarks within a target user’s sequence by leveraging subsequence popularity to compensate for shorter signals. Across three datasets and five representative models, DWRS achieves high watermark validity with minimal impact on utility and shows robustness against finetuning, distillation, and sequential rule mining, though DWRS-U exhibits more model-dependent variability. The results suggest practical applicability for data owners to claim ownership and deter unauthorized use, with DWRS-D generally offering stronger performance and stealth, and code released for reproducibility.
Abstract
In the era of large foundation models, data has become a crucial component in building high-performance AI systems. As the demand for high-quality and large-scale data continues to rise, data copyright protection is attracting increasing attention. In this work, we explore the problem of data watermarking for sequential recommender systems, where a watermark is embedded into the target dataset and can be detected in models trained on that dataset. We focus on two settings: dataset watermarking, which protects the ownership of the entire dataset, and user watermarking, which safeguards the data of individual users. We present a method named Dataset Watermarking for Recommender Systems (DWRS) to address them. We define the watermark as a sequence of consecutive items inserted into normal users' interaction sequences. We define a Receptive Field (RF) to guide the inserting process to facilitate the memorization of the watermark. Extensive experiments on five representative sequential recommendation models and three benchmark datasets demonstrate the effectiveness of DWRS in protecting data copyright while preserving model utility.
