Table of Contents
Fetching ...

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Zheng Gao, Yifan Yang, Xiaoyu Li, Xiaoyan Feng, Haoran Fan, Yang Song, Jiaojiao Jiang

Abstract

Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns can be forged via inversion and regeneration attacks. Recent semantic-aware watermarking methods improve robustness by conditioning verification on image semantics. However, their reliance on a single global semantic binding makes them vulnerable to localized but globally coherent semantic edits. To address this limitation and provide a trustworthy semantic-aware watermark, we propose $\underline{\textbf{S}}$emantic $\underline{\textbf{L}}$atent $\underline{\textbf{I}}$njection via $\underline{\textbf{C}}$ompartmentalized $\underline{\textbf{E}}$mbedding ($\textbf{SLICE}$). Our framework decouples image semantics into four semantic factors (subject, environment, action, and detail) and precisely anchors them to distinct regions in the initial Gaussian noise. This fine-grained semantic binding enables advanced watermark verification where semantic tampering is detectable and localizable. We theoretically justify why SLICE enables robust and reliable tamper localization and provides statistical guarantees on false-accept rates. Experimental results demonstrate that SLICE significantly outperforms existing baselines against advanced semantic-guided regeneration attacks, substantially reducing attack success while preserving image quality and semantic fidelity. Overall, SLICE offers a practical, training-free provenance solution that is both fine-grained in diagnosis and robust to realistic adversarial manipulations.

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Abstract

Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns can be forged via inversion and regeneration attacks. Recent semantic-aware watermarking methods improve robustness by conditioning verification on image semantics. However, their reliance on a single global semantic binding makes them vulnerable to localized but globally coherent semantic edits. To address this limitation and provide a trustworthy semantic-aware watermark, we propose emantic atent njection via ompartmentalized mbedding (). Our framework decouples image semantics into four semantic factors (subject, environment, action, and detail) and precisely anchors them to distinct regions in the initial Gaussian noise. This fine-grained semantic binding enables advanced watermark verification where semantic tampering is detectable and localizable. We theoretically justify why SLICE enables robust and reliable tamper localization and provides statistical guarantees on false-accept rates. Experimental results demonstrate that SLICE significantly outperforms existing baselines against advanced semantic-guided regeneration attacks, substantially reducing attack success while preserving image quality and semantic fidelity. Overall, SLICE offers a practical, training-free provenance solution that is both fine-grained in diagnosis and robust to realistic adversarial manipulations.
Paper Structure (21 sections, 5 theorems, 28 equations, 5 figures, 4 tables)

This paper contains 21 sections, 5 theorems, 28 equations, 5 figures, 4 tables.

Key Result

Theorem 4.3

Let $\mathcal{J} \subseteq \mathcal{K}$ be the set of tampered semantic factors. Assume that Assumptions as:bound_err and as:sem_pertb hold. If the set of local threshold $\{\tau_k\}_{k\in\mathcal{K}}$ satisfies $\tau_k \geq \epsilon_k + \delta_k$ for all $k \in \mathcal{K} \setminus\mathcal{J}$ and We write $a_+ = \max\{a, 0\}$ for any $a \in \mathbb{R}$.

Figures (5)

  • Figure 1: The overall framework of SLICE.
  • Figure 2: Structure of the Meta-Prompt $\mathcal{P}_{\mathrm{meta}}$.
  • Figure 3: Semantic extraction stability across prompt languages. The axes represent text embedding cosine similarity between initial and re-extracted descriptors.
  • Figure 4: Qualitative comparison of visual fidelity with and without SLICE watermarking.
  • Figure 5: Case study of the proposed multi-granularity verification mechanism.

Theorems & Definitions (7)

  • Theorem 4.3: Robust localization under partial corruption
  • Theorem 4.4: Exponential false-accept bound for keyless or unwatermarked inputs
  • Theorem A.1: Restatement of Theorem \ref{['thm:main']}
  • proof
  • Lemma B.1: Chernoff bounds, Theorem 2.17 in zhang2023mathematical
  • Theorem B.2: Restatement of Theorem \ref{['thm:exp']}
  • proof