Table of Contents
Fetching ...

Multi-Designated Detector Watermarking for Language Models

Zhengan Huang, Gongxian Zeng, Xin Mu, Yu Wang, Yue Yu

TL;DR

This paper introduces claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings, and proposes a generic transformation converting any MDVS to a claimable MDVS.

Abstract

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.

Multi-Designated Detector Watermarking for Language Models

TL;DR

This paper introduces claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings, and proposes a generic transformation converting any MDVS to a claimable MDVS.

Abstract

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.
Paper Structure (27 sections, 12 theorems, 45 equations, 17 figures, 1 table, 15 algorithms)

This paper contains 27 sections, 12 theorems, 45 equations, 17 figures, 1 table, 15 algorithms.

Key Result

theorem thmcountertheorem

If an MDDW scheme supports the off-the-record property for any subset and soundness, then the size of the generated watermarks must be $\Omega(n)$, where $n$ is the number of the designated detectors (i.e., $|S|=n$).

Figures (17)

  • Figure 1: The MDDW framework based on MDVS
  • Figure 2: Games $\textup{G}^{\textup{cons}}_{\textup{MDVS},\mathcal{A}}(\lambda)$ and $\textup{G}^{\textup{unforg}}_{\textup{MDVS},\mathcal{A}}(\lambda)$ for MDVS, and the oracles are given in Fig. \ref{['fig:MDVS_oracle']}
  • Figure 3: The oracles for the games defining security notions for MDVS
  • Figure 4: Games $\textup{G}^{\textup{otr-ds}}_{\textup{MDVS},\mathcal{A},\textup{FgeDS}}(\lambda)$ and $\textup{G}^{\textup{otr-as}}_{\textup{MDVS},\mathcal{A},\textup{FgeAS}}(\lambda)$ for MDVS, and the oracles are given in Fig. \ref{['fig:MDVS_oracle']}
  • Figure 5: Games $\textup{G}^{\textup{cons}}_{\textup{MDDW},\mathcal{A}}(\lambda)$, $\textup{G}^{\textup{sound}}_{\textup{MDDW},\mathcal{A}}(\lambda)$ and $\textup{G}^{\textup{dist-fr}}_{\textup{MDDW},\mathcal{A}}(\lambda)$ for MDDW, and the oracles are given in Fig. \ref{['fig:MDDW_oracle']}
  • ...and 12 more figures

Theorems & Definitions (52)

  • theorem thmcountertheorem
  • proof
  • definition thmcounterdefinition: Auto-regressive model
  • definition thmcounterdefinition: Correctness
  • definition thmcounterdefinition: Consistency
  • definition thmcounterdefinition: Unforgeability
  • definition thmcounterdefinition: Off-the-record for designated set
  • definition thmcounterdefinition: Off-the-record for any subset
  • definition thmcounterdefinition: Completeness
  • definition thmcounterdefinition: Consistency
  • ...and 42 more