Multi-Designated Detector Watermarking for Language Models

Zhengan Huang; Gongxian Zeng; Xin Mu; Yu Wang; Yue Yu

Multi-Designated Detector Watermarking for Language Models

Zhengan Huang, Gongxian Zeng, Xin Mu, Yu Wang, Yue Yu

TL;DR

This paper introduces claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings, and proposes a generic transformation converting any MDVS to a claimable MDVS.

Abstract

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.

Multi-Designated Detector Watermarking for Language Models

TL;DR

Abstract

Paper Structure (27 sections, 12 theorems, 45 equations, 17 figures, 1 table, 15 algorithms)

This paper contains 27 sections, 12 theorems, 45 equations, 17 figures, 1 table, 15 algorithms.

Introduction
Preliminaries
Language models
Multi-designated verifier signature
Multi-designated detector watermarking
MDDW construction
Generic construction of MDDW
MDDW construction with claimability
Instantiation of claimable MDVS
Evaluation
Preliminaries: cryptographic assumptions and lemmas
Preliminaries: pseudorandom function, commitment, and signature
Preliminaries: A MDVS scheme in au2014strong and BLS signature boneh2004short
Proof of Theorem \ref{['thm:MDDW_required_security']}
Proof of completeness (in Theorem \ref{['thm:MDDW_required_security']})
...and 12 more sections

Key Result

theorem thmcountertheorem

If an MDDW scheme supports the off-the-record property for any subset and soundness, then the size of the generated watermarks must be $\Omega(n)$, where $n$ is the number of the designated detectors (i.e., $|S|=n$).

Figures (17)

Figure 1: The MDDW framework based on MDVS
Figure 2: Games $\textup{G}^{\textup{cons}}_{\textup{MDVS},\mathcal{A}}(\lambda)$ and $\textup{G}^{\textup{unforg}}_{\textup{MDVS},\mathcal{A}}(\lambda)$ for MDVS, and the oracles are given in Fig. \ref{['fig:MDVS_oracle']}
Figure 3: The oracles for the games defining security notions for MDVS
Figure 4: Games $\textup{G}^{\textup{otr-ds}}_{\textup{MDVS},\mathcal{A},\textup{FgeDS}}(\lambda)$ and $\textup{G}^{\textup{otr-as}}_{\textup{MDVS},\mathcal{A},\textup{FgeAS}}(\lambda)$ for MDVS, and the oracles are given in Fig. \ref{['fig:MDVS_oracle']}
Figure 5: Games $\textup{G}^{\textup{cons}}_{\textup{MDDW},\mathcal{A}}(\lambda)$, $\textup{G}^{\textup{sound}}_{\textup{MDDW},\mathcal{A}}(\lambda)$ and $\textup{G}^{\textup{dist-fr}}_{\textup{MDDW},\mathcal{A}}(\lambda)$ for MDDW, and the oracles are given in Fig. \ref{['fig:MDDW_oracle']}
...and 12 more figures

Theorems & Definitions (52)

theorem thmcountertheorem
proof
definition thmcounterdefinition: Auto-regressive model
definition thmcounterdefinition: Correctness
definition thmcounterdefinition: Consistency
definition thmcounterdefinition: Unforgeability
definition thmcounterdefinition: Off-the-record for designated set
definition thmcounterdefinition: Off-the-record for any subset
definition thmcounterdefinition: Completeness
definition thmcounterdefinition: Consistency
...and 42 more

Multi-Designated Detector Watermarking for Language Models

TL;DR

Abstract

Multi-Designated Detector Watermarking for Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (52)