SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

Zhiwen Ren; Zehua Ma; Weiming Zhang; Nenghai Yu

SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

Zhiwen Ren, Zehua Ma, Weiming Zhang, Nenghai Yu

TL;DR

The paper tackles tracing database leaks without modifying original data by introducing SPSW, a fake-tuple based watermarking scheme that employs a sparse priority strategy to embed multi-bit watermarks with fewer insertions. It defines watermark length $L = \lceil \log_2(n_u) \rceil$, uses NLP-generated fake tuples, and maps each user to a watermark, prioritizing sequences with more '0' bits to reduce disturbance. Theoretical analysis shows improved transparency ($NI < xL/2$) and robustness under deletion attacks, with an expected extraction reliability $EP \approx (1 - \frac{p^x}{2})^L$, outperforming existing fake-tuple methods. Empirical results on robustness and transparency corroborate the theory, demonstrating higher resilience to deletions and lower insertion overhead, enabling scalable traceability in practice.

Abstract

Databases play a crucial role in storing and managing vast amounts of data in various organizations and industries. Yet the risk of database leakage poses a significant threat to data privacy and security. To trace the source of database leakage, researchers have proposed many database watermarking schemes. Among them, fake-tuples-based database watermarking shows great potential as it does not modify the original data of the database, ensuring the seamless usability of the watermarked database. However, the existing fake-tuple-based database watermarking schemes need to insert a large number of fake tuples for the embedding of each watermark bit, resulting in low watermark transparency. Therefore, we propose a novel database watermarking scheme based on fake tuples and sparse priority strategy, named SPSW, which achieves the same watermark capacity with a lower number of inserted fake tuples compared to the existing embedding strategy. Specifically, for a database about to be watermarked, we prioritize embedding the sparsest watermark sequence, i.e., the sequence containing the most `0' bits among the currently available watermark sequences. For each bit in the sparse watermark sequence, when it is set to `1', SPSW will embed the corresponding set of fake tuples into the database. Otherwise, no modifications will be made to the database. Through theoretical analysis, the proposed sparse priority strategy not only improves transparency but also enhances the robustness of the watermark. The comparative experimental results with other database watermarking schemes further validate the superior performance of the proposed SPSW, aligning with the theoretical analysis.

SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

TL;DR

, uses NLP-generated fake tuples, and maps each user to a watermark, prioritizing sequences with more '0' bits to reduce disturbance. Theoretical analysis shows improved transparency (

) and robustness under deletion attacks, with an expected extraction reliability

, outperforming existing fake-tuple methods. Empirical results on robustness and transparency corroborate the theory, demonstrating higher resilience to deletions and lower insertion overhead, enabling scalable traceability in practice.

Abstract

Paper Structure (24 sections, 19 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 24 sections, 19 equations, 5 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Method
Watermark Embedding Process
Watermark Assignment
Fake Tuples Generation
Watermark Embedding
Watermark Extraction Process
Theoretical Analysis
Transparency Analysis
Robustness Analysis
Probability that a certain bit is right
Probability that the extracted watermark is right
Probability that there are $k$ '1's in one watermark
Probability Expectation of accurately extracting the watermark
...and 9 more sections

Figures (5)

Figure 1: The framework of the proposed SPSW scheme.
Figure 2: The effect of the number $x$ of fake tuples per group on the extraction accuracy.
Figure 3: The effect of the number of users $n_u$ on the extraction accuracy.
Figure 4: Comparison of the extraction accuracy between comparative schemes and the proposed SPSW.
Figure 5: Average value of inserted fake tuples.

SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

TL;DR

Abstract

SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

Authors

TL;DR

Abstract

Table of Contents

Figures (5)