Table of Contents
Fetching ...

Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment

Tianyu Chen, Jian Lou, Wenjie Wang

TL;DR

AQUA is proposed, the first watermark framework for image knowledge protection in Multimodal RAG systems that embeds semantic signals into synthetic images using two complementary methods: acronym-based triggers and spatial relationship cues.

Abstract

As Retrieval-Augmented Generation (RAG) evolves into service-oriented platforms (Rag-as-a-Service) with shared knowledge bases, protecting the copyright of contributed data becomes essential. Existing watermarking methods in RAG focus solely on textual knowledge, leaving image knowledge unprotected. In this work, we propose AQUA, the first watermark framework for image knowledge protection in Multimodal RAG systems. AQUA embeds semantic signals into synthetic images using two complementary methods: acronym-based triggers and spatial relationship cues. These techniques ensure watermark signals survive indirect watermark propagation from image retriever to textual generator, being efficient, effective and imperceptible. Experiments across diverse models and datasets show that AQUA enables robust, stealthy, and reliable copyright tracing, filling a key gap in multimodal RAG protection.

Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment

TL;DR

AQUA is proposed, the first watermark framework for image knowledge protection in Multimodal RAG systems that embeds semantic signals into synthetic images using two complementary methods: acronym-based triggers and spatial relationship cues.

Abstract

As Retrieval-Augmented Generation (RAG) evolves into service-oriented platforms (Rag-as-a-Service) with shared knowledge bases, protecting the copyright of contributed data becomes essential. Existing watermarking methods in RAG focus solely on textual knowledge, leaving image knowledge unprotected. In this work, we propose AQUA, the first watermark framework for image knowledge protection in Multimodal RAG systems. AQUA embeds semantic signals into synthetic images using two complementary methods: acronym-based triggers and spatial relationship cues. These techniques ensure watermark signals survive indirect watermark propagation from image retriever to textual generator, being efficient, effective and imperceptible. Experiments across diverse models and datasets show that AQUA enables robust, stealthy, and reliable copyright tracing, filling a key gap in multimodal RAG protection.

Paper Structure

This paper contains 92 sections, 14 equations, 16 figures, 32 tables.

Figures (16)

  • Figure 1: Overview of the RAG-as-a-Service (RaaS) workflow. Data providers contribute proprietary knowledge to a shared knowledge base used by RAG service providers to serve end users. Data providers can issue watermark probe queries to RAG services. If the watermark is detected in an unauthorized RAG service, it indicates unauthorized use.
  • Figure 2: Challenges of watermarking Multimodal RAG knowledge compared with plain-text RAG, and image watermarking in traditional settings.
  • Figure 3: Illustration of the watermark injection (left) and verification (right) of AQUA.
  • Figure 3: FPR represents the proportion of non-watermarked images incorrectly identified as watermarked, while TPR is the proportion of watermarked images that are correctly identified.
  • Figure 4: Diagnostics of AQUA watermark detection.
  • ...and 11 more figures