Table of Contents
Fetching ...

Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs

Chengwei Xia, Fan Ma, Ruijie Quan, Yunqiu Xu, Kun Zhan, Yi Yang

TL;DR

This work tackles copyright protection for open-source multimodal LLMs under black-box constraints. It introduces Adversarial-Guided Dual-Injection (AGDI), which generates a trigger image through adversarial optimization and leverages two injections—response-level and semantic-level—tied to a rare trigger QA pair to signal ownership across derivative models, with an adversarial auxiliary model to simulate downstream resistance. By exploiting the CLIP-like cross-modal alignment module present in many MLLMs, AGDI achieves cross-derivative generalization and robustness against pruning, merging, and quantization, enabling reliable ownership verification post-deployment. The proposed framework demonstrates superior tracking performance across diverse fine-tuned variants, suggesting practical utility for safeguarding open-source MLLMs in real-world settings.

Abstract

With the rapid deployment and widespread adoption of multimodal large language models (MLLMs), disputes regarding model version attribution and ownership have become increasingly frequent, raising significant concerns about intellectual property protection. In this paper, we propose a framework for generating copyright triggers for MLLMs, enabling model publishers to embed verifiable ownership information into the model. The goal is to construct trigger images that elicit ownership-related textual responses exclusively in fine-tuned derivatives of the original model, while remaining inert in other non-derivative models. Our method constructs a tracking trigger image by treating the image as a learnable tensor, performing adversarial optimization with dual-injection of ownership-relevant semantic information. The first injection is achieved by enforcing textual consistency between the output of an auxiliary MLLM and a predefined ownership-relevant target text; the consistency loss is backpropagated to inject this ownership-related information into the image. The second injection is performed at the semantic-level by minimizing the distance between the CLIP features of the image and those of the target text. Furthermore, we introduce an additional adversarial training stage involving the auxiliary model derived from the original model itself. This auxiliary model is specifically trained to resist generating ownership-relevant target text, thereby enhancing robustness in heavily fine-tuned derivative models. Extensive experiments demonstrate the effectiveness of our dual-injection approach in tracking model lineage under various fine-tuning and domain-shift scenarios.

Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs

TL;DR

This work tackles copyright protection for open-source multimodal LLMs under black-box constraints. It introduces Adversarial-Guided Dual-Injection (AGDI), which generates a trigger image through adversarial optimization and leverages two injections—response-level and semantic-level—tied to a rare trigger QA pair to signal ownership across derivative models, with an adversarial auxiliary model to simulate downstream resistance. By exploiting the CLIP-like cross-modal alignment module present in many MLLMs, AGDI achieves cross-derivative generalization and robustness against pruning, merging, and quantization, enabling reliable ownership verification post-deployment. The proposed framework demonstrates superior tracking performance across diverse fine-tuned variants, suggesting practical utility for safeguarding open-source MLLMs in real-world settings.

Abstract

With the rapid deployment and widespread adoption of multimodal large language models (MLLMs), disputes regarding model version attribution and ownership have become increasingly frequent, raising significant concerns about intellectual property protection. In this paper, we propose a framework for generating copyright triggers for MLLMs, enabling model publishers to embed verifiable ownership information into the model. The goal is to construct trigger images that elicit ownership-related textual responses exclusively in fine-tuned derivatives of the original model, while remaining inert in other non-derivative models. Our method constructs a tracking trigger image by treating the image as a learnable tensor, performing adversarial optimization with dual-injection of ownership-relevant semantic information. The first injection is achieved by enforcing textual consistency between the output of an auxiliary MLLM and a predefined ownership-relevant target text; the consistency loss is backpropagated to inject this ownership-related information into the image. The second injection is performed at the semantic-level by minimizing the distance between the CLIP features of the image and those of the target text. Furthermore, we introduce an additional adversarial training stage involving the auxiliary model derived from the original model itself. This auxiliary model is specifically trained to resist generating ownership-relevant target text, thereby enhancing robustness in heavily fine-tuned derivative models. Extensive experiments demonstrate the effectiveness of our dual-injection approach in tracking model lineage under various fine-tuning and domain-shift scenarios.
Paper Structure (34 sections, 8 equations, 12 figures, 18 tables, 1 algorithm)

This paper contains 34 sections, 8 equations, 12 figures, 18 tables, 1 algorithm.

Figures (12)

  • Figure 1: The overview of copyright tracking for MLLMs. (a)The publisher releases an MLLM, but (b) a malicious user's infringement leads to urgent copyright protection needs. (c) We design a trigger question-answer pair and generate trigger image using a dual-injection method for copyright tracking.
  • Figure 2: The pipeline of our proposed method for copyright tracking. During optimization, the trigger question and target answer are fixed while the trigger image is updated to align target. We optimize the trigger image by adversarial-guided dual-injection mechanism to inject verifiable ownership-related target information. The first injection enforces response-level alignment between the auxiliary model output and the target, while the second injection minimizes cross-modal semantic embedding distance. We incorporate adversarial training involving the auxiliary model to enhance the robustness of trigger images against model derivatives.
  • Figure 3: Comparison of trigger images' response from non-derivative models and derivative models, and clean images' response from derivative models. The triggers and cleans use same trigger questions on the models.
  • Figure 4: ASR comparison between ADGI and baselines under 8-bit quantization: (a) five fine-tuned variants of LLaVA-1.5; (b) five fine-tuned variants of Qwen2-VL.
  • Figure 5: Hyperparameter analysis results in a single trigger question-answer pair: "Q: Detecting copyright. A: ICLR Conference". on the LLaVA-1.5 fine-tuned models. (a) The impact of model learning rate on tracking performance. (b) The impact of optimization steps on tracking performance. (c) The impact of perturbation budget on tracking performance.
  • ...and 7 more figures