Table of Contents
Fetching ...

Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images

Yubo Wang, Jianting Tang, Chaohu Liu, Linli Xu

TL;DR

Facing copyright and unauthorized fine-tuning of LVLMs, the authors propose Parameter Learning Attack (PLA), which generates trigger images by adversarially attacking the original model while updating parameters in the opposite direction to preserve the published model and enable tracking of fine-tuned derivatives. The method designs rare question–answer pairs and constructs triggers that cause both the original and derivative models to output a predetermined target, measured by Target Match Rate (TMR). Empirical results on LLaVA-1.5 across six downstream fine-tuning scenarios show PLA outperforms backdoor-based and ordinary adversarial baselines, with robustness to input transformations and parameter perturbations. The work provides a practical, post-release copyright-protection mechanism for LVLMs and demonstrates generalizability to multiple LVLM architectures and fine-tuning strategies.

Abstract

Large vision-language models (LVLMs) have demonstrated remarkable image understanding and dialogue capabilities, allowing them to handle a variety of visual question answering tasks. However, their widespread availability raises concerns about unauthorized usage and copyright infringement, where users or individuals can develop their own LVLMs by fine-tuning published models. In this paper, we propose a novel method called Parameter Learning Attack (PLA) for tracking the copyright of LVLMs without modifying the original model. Specifically, we construct adversarial images through targeted attacks against the original model, enabling it to generate specific outputs. To ensure these attacks remain effective on potential fine-tuned models to trigger copyright tracking, we allow the original model to learn the trigger images by updating parameters in the opposite direction during the adversarial attack process. Notably, the proposed method can be applied after the release of the original model, thus not affecting the model's performance and behavior. To simulate real-world applications, we fine-tune the original model using various strategies across diverse datasets, creating a range of models for copyright verification. Extensive experiments demonstrate that our method can more effectively identify the original copyright of fine-tuned models compared to baseline methods. Therefore, this work provides a powerful tool for tracking copyrights and detecting unlicensed usage of LVLMs.

Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images

TL;DR

Facing copyright and unauthorized fine-tuning of LVLMs, the authors propose Parameter Learning Attack (PLA), which generates trigger images by adversarially attacking the original model while updating parameters in the opposite direction to preserve the published model and enable tracking of fine-tuned derivatives. The method designs rare question–answer pairs and constructs triggers that cause both the original and derivative models to output a predetermined target, measured by Target Match Rate (TMR). Empirical results on LLaVA-1.5 across six downstream fine-tuning scenarios show PLA outperforms backdoor-based and ordinary adversarial baselines, with robustness to input transformations and parameter perturbations. The work provides a practical, post-release copyright-protection mechanism for LVLMs and demonstrates generalizability to multiple LVLM architectures and fine-tuning strategies.

Abstract

Large vision-language models (LVLMs) have demonstrated remarkable image understanding and dialogue capabilities, allowing them to handle a variety of visual question answering tasks. However, their widespread availability raises concerns about unauthorized usage and copyright infringement, where users or individuals can develop their own LVLMs by fine-tuning published models. In this paper, we propose a novel method called Parameter Learning Attack (PLA) for tracking the copyright of LVLMs without modifying the original model. Specifically, we construct adversarial images through targeted attacks against the original model, enabling it to generate specific outputs. To ensure these attacks remain effective on potential fine-tuned models to trigger copyright tracking, we allow the original model to learn the trigger images by updating parameters in the opposite direction during the adversarial attack process. Notably, the proposed method can be applied after the release of the original model, thus not affecting the model's performance and behavior. To simulate real-world applications, we fine-tune the original model using various strategies across diverse datasets, creating a range of models for copyright verification. Extensive experiments demonstrate that our method can more effectively identify the original copyright of fine-tuned models compared to baseline methods. Therefore, this work provides a powerful tool for tracking copyrights and detecting unlicensed usage of LVLMs.

Paper Structure

This paper contains 29 sections, 7 equations, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: The pipeline of trigger construction and copyright verification. We first construct the trigger image based on adversarial attacks through our proposed method (PLA). Then, we use the trigger image and text to query LVLMs for copyright verification.
  • Figure 2: The overview of our proposed method for copyright tracking. (a) We employ Parameter Learning Attack (PLA) to construct trigger images, where the model’s parameters are updated during the adversarial attack process to maximize the cross-entropy loss between the model's output and the trigger target. (b) We design rare question-answer pairs and ensure they are infrequent in downstream task datasets. (c) We fine-tune the original LVLM on various downstream tasks and then use the constructed triggers to track their copyright to validate the effectiveness of our method.
  • Figure 3: The comparison of different adversarial attacks. Compared to the ordinary attack, RNA introduces slight noise to the model, while PLA allows for model parameter updates.
  • Figure 4: Comparison of responses from LLaVA, different fine-tuned models, and unrelated LVLMs when queried with clean images and trigger images, where "*" denotes the fine-tuned models on specific datasets.
  • Figure 5: Ablation results with a single QA pair "Q: Detecting copyright. A: ICLR Conference." (a) The impact of model learning rate in PLA on tracking performance. (b) The relationship between tracking performance and model fine-tuning epochs. (c) The effect of the number of fine-tuning samples on tracking performance.
  • ...and 7 more figures