Table of Contents
Fetching ...

Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?

Qipan Xu, Zhenting Wang, Xiaoxiao He, Ligong Han, Ruixiang Tang

TL;DR

This paper investigates whether large vision-language models (LVLMs) can detect intellectual property (IP) infringement in images generated by GenAI. It introduces a benchmark dataset with five iconic IP characters, containing both positive infringing samples and ambiguously non-infringing negatives, generated via multiple diffusion models and refined with prompt engineering. The authors evaluate seven LVLMs under in-context learning (ICL) and zero-shot VQA, revealing high recall but low precision and a tendency to misclassify ambiguous negatives as infringements, indicating overfitting to superficial features. They analyze failure cases and propose mitigation strategies, including contrastive learning, to improve robustness, highlighting the need for dedicated benchmarks to enable safer, legally aligned deployment of LVLM-based content moderation tools.

Abstract

Generative AI models, renowned for their ability to synthesize high-quality content, have sparked growing concerns over the improper generation of copyright-protected material. While recent studies have proposed various approaches to address copyright issues, the capability of large vision-language models (LVLMs) to detect copyright infringements remains largely unexplored. In this work, we focus on evaluating the copyright detection abilities of state-of-the-art LVLMs using a various set of image samples. Recognizing the absence of a comprehensive dataset that includes both IP-infringement samples and ambiguous non-infringement negative samples, we construct a benchmark dataset comprising positive samples that violate the copyright protection of well-known IP figures, as well as negative samples that resemble these figures but do not raise copyright concerns. This dataset is created using advanced prompt engineering techniques. We then evaluate leading LVLMs using our benchmark dataset. Our experimental results reveal that LVLMs are prone to overfitting, leading to the misclassification of some negative samples as IP-infringement cases. In the final section, we analyze these failure cases and propose potential solutions to mitigate the overfitting problem.

Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?

TL;DR

This paper investigates whether large vision-language models (LVLMs) can detect intellectual property (IP) infringement in images generated by GenAI. It introduces a benchmark dataset with five iconic IP characters, containing both positive infringing samples and ambiguously non-infringing negatives, generated via multiple diffusion models and refined with prompt engineering. The authors evaluate seven LVLMs under in-context learning (ICL) and zero-shot VQA, revealing high recall but low precision and a tendency to misclassify ambiguous negatives as infringements, indicating overfitting to superficial features. They analyze failure cases and propose mitigation strategies, including contrastive learning, to improve robustness, highlighting the need for dedicated benchmarks to enable safer, legally aligned deployment of LVLM-based content moderation tools.

Abstract

Generative AI models, renowned for their ability to synthesize high-quality content, have sparked growing concerns over the improper generation of copyright-protected material. While recent studies have proposed various approaches to address copyright issues, the capability of large vision-language models (LVLMs) to detect copyright infringements remains largely unexplored. In this work, we focus on evaluating the copyright detection abilities of state-of-the-art LVLMs using a various set of image samples. Recognizing the absence of a comprehensive dataset that includes both IP-infringement samples and ambiguous non-infringement negative samples, we construct a benchmark dataset comprising positive samples that violate the copyright protection of well-known IP figures, as well as negative samples that resemble these figures but do not raise copyright concerns. This dataset is created using advanced prompt engineering techniques. We then evaluate leading LVLMs using our benchmark dataset. Our experimental results reveal that LVLMs are prone to overfitting, leading to the misclassification of some negative samples as IP-infringement cases. In the final section, we analyze these failure cases and propose potential solutions to mitigate the overfitting problem.

Paper Structure

This paper contains 24 sections, 2 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of positive and negative samples in the dataset. First row: positive samples. Second row: Negative samples. Copyright protected characters from left to right: Iron-Man, Batman, Superman, Spider-man and Super-Mario.
  • Figure 2: Generated positive samples with direct prompt from Stable Diffusion XL. Copyright protected characters from left to right: Iron-Man, Batman, Superman, Spider-man and Super-Mario.
  • Figure 3: Generated positive samples with descriptive prompt (w/o character's name) from Ideogram AI. Copyright protected characters from left to right: Iron-Man, Batman, Superman, Spider-man and Super-Mario.
  • Figure 4: Generated negative samples with character name as negative prompts from DALL-E. Copyright protected characters from left to right: Iron-Man, Batman, Superman, Spider-man and Super-Mario.
  • Figure 5: Generated negative samples with character name as negative prompts from Stable-Diffusion XL Perp-Neg. Copyright protected characters from left to right: Iron-Man, Batman, Superman, Spider-man and Super-Mario.
  • ...and 1 more figures