Table of Contents
Fetching ...

GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

Zixuan Chen, Guangcong Wang, Jiahao Zhu, Jianhuang Lai, Xiaohua Xie

TL;DR

GuardSplat addresses the need for practical copyright protection of 3D Gaussian Splatting assets by embedding large-capacity watermarks into SH features with a CLIP-guided, decoupled decoder. It advances three core ideas: (i) CLIP-guided Message Decoupling Optimization to train a compact, general-purpose decoder; (ii) SH-aware Message Embedding that minimally perturbs SH, preserving geometry while enabling watermarking; and (iii) an Anti-distortion Extraction module to robustify watermark retrieval under common rendering distortions. Across Blender Nerf and LLFF, it demonstrates superior capacity, invisibility, and robustness relative to state-of-the-art methods, with fast optimization times suitable for real-world workflows. The approach offers a practical, secure, and scalable solution for protecting 3DGS assets in professional pipelines and has potential to influence watermarking practices in 3D rendering and content protection.

Abstract

3D Gaussian Splatting (3DGS) has recently created impressive 3D assets for various applications. However, considering security, capacity, invisibility, and training efficiency, the copyright of 3DGS assets is not well protected as existing watermarking methods are unsuited for its rendering pipeline. In this paper, we propose GuardSplat, an innovative and efficient framework for watermarking 3DGS assets. Specifically, 1) We propose a CLIP-guided pipeline for optimizing the message decoder with minimal costs. The key objective is to achieve high-accuracy extraction by leveraging CLIP's aligning capability and rich representations, demonstrating exceptional capacity and efficiency. 2) We tailor a Spherical-Harmonic-aware (SH-aware) Message Embedding module for 3DGS, seamlessly embedding messages into the SH features of each 3D Gaussian while preserving the original 3D structure. This enables watermarking 3DGS assets with minimal fidelity trade-offs and prevents malicious users from removing the watermarks from the model files, meeting the demands for invisibility and security. 3) We present an Anti-distortion Message Extraction module to improve robustness against various distortions. Experiments demonstrate that GuardSplat outperforms state-of-the-art and achieves fast optimization speed. Project page is at https://narcissusex.github.io/GuardSplat, and Code is at https://github.com/NarcissusEx/GuardSplat.

GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

TL;DR

GuardSplat addresses the need for practical copyright protection of 3D Gaussian Splatting assets by embedding large-capacity watermarks into SH features with a CLIP-guided, decoupled decoder. It advances three core ideas: (i) CLIP-guided Message Decoupling Optimization to train a compact, general-purpose decoder; (ii) SH-aware Message Embedding that minimally perturbs SH, preserving geometry while enabling watermarking; and (iii) an Anti-distortion Extraction module to robustify watermark retrieval under common rendering distortions. Across Blender Nerf and LLFF, it demonstrates superior capacity, invisibility, and robustness relative to state-of-the-art methods, with fast optimization times suitable for real-world workflows. The approach offers a practical, secure, and scalable solution for protecting 3DGS assets in professional pipelines and has potential to influence watermarking practices in 3D rendering and content protection.

Abstract

3D Gaussian Splatting (3DGS) has recently created impressive 3D assets for various applications. However, considering security, capacity, invisibility, and training efficiency, the copyright of 3DGS assets is not well protected as existing watermarking methods are unsuited for its rendering pipeline. In this paper, we propose GuardSplat, an innovative and efficient framework for watermarking 3DGS assets. Specifically, 1) We propose a CLIP-guided pipeline for optimizing the message decoder with minimal costs. The key objective is to achieve high-accuracy extraction by leveraging CLIP's aligning capability and rich representations, demonstrating exceptional capacity and efficiency. 2) We tailor a Spherical-Harmonic-aware (SH-aware) Message Embedding module for 3DGS, seamlessly embedding messages into the SH features of each 3D Gaussian while preserving the original 3D structure. This enables watermarking 3DGS assets with minimal fidelity trade-offs and prevents malicious users from removing the watermarks from the model files, meeting the demands for invisibility and security. 3) We present an Anti-distortion Message Extraction module to improve robustness against various distortions. Experiments demonstrate that GuardSplat outperforms state-of-the-art and achieves fast optimization speed. Project page is at https://narcissusex.github.io/GuardSplat, and Code is at https://github.com/NarcissusEx/GuardSplat.

Paper Structure

This paper contains 22 sections, 10 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Application scenarios of GuardSplat. To protect the copyright of 3D Gaussian Splatting (3DGS) 3dgs assets, (a) the owners (Alice) can use our GuardSplat to embed the secret message (blue key) into these models. (b) If malicious users (Bob) render views for unauthorized uses, (c)Alice can use the private message decoder to extract messages (purple key) for copyright identification.
  • Figure 2: Comparisons of four 3D watermarking frameworks. They differ in how to embed messages and train message decoders. (a) Directly training 3D models on the watermarked images. (b) Simultaneously training a 3D model and a message decoder. (c) Employing the message decoder from a 2D watermarker for optimization. (d)GuardSplat first trains a message decoder to extract messages from CLIP clip textual features. This message decoder then can be applied to the CLIP visual features for watermarking 3D models via optimization.
  • Figure 3: Performance of state-of-the-art methods with $N_L=32$ bits on Blender nerf and LLFF llff datasets. The radius of circles is proportional to their total training time (decoder optimization + watermarking) evaluated on RTX 3090 GPU.
  • Figure 4: Overview of GuardSplat.(a) Given a binary message $M\in\{0, 1\}^{L}_{i=1}$, we first transform it into CLIP tokens $T$ using the proposed message tokenization. We then employ CLIP's textual encoder $\mathcal{E_T}$ to map $T$ to the textual feature $F_\mathcal{T}$. Finally, we feed $F_\mathcal{T}$ into message decoder $\mathcal{D_M}$ to extract the message $\hat{M}\in\{0, 1\}^{L}_{i=1}$ for optimization. (b) For each 3D Gaussian, we freeze all the attributes and build a learnable spherical harmonic (SH) offset $\boldsymbol{h}^o_i$ as the watermarked SH feature, which can be added to the original SH features as $\boldsymbol{h}_i + \boldsymbol{h}^o_i$ to render the watermarked views. (c) We first feed the 2D rendered views to CLIP's visual encoder $\mathcal{E_V}$ to acquire the visual feature $F_{\mathcal{V}}$ and then employ the pre-trained message decoder to extract the message $\hat{M}$. A differentiable distortion layer is used to simulate various visual distortions during optimization. $\mathcal{D_M}$ and $\boldsymbol{h}^o_i$ are optimized by \ref{['eq:message_loss']} and \ref{['eq:loss']}, respectively.
  • Figure 5: ROC curves produced by varying thresholds in StegExpose stegexpose on different methods. The closer the curve is to the "Reference", the more effective the method is regarding security.
  • ...and 8 more figures