Table of Contents
Fetching ...

Semantic to Structure: Learning Structural Representations for Infringement Detection

Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie Zhou

TL;DR

This work tackles structural infringement in image generation by focusing on the structure of an image rather than its content. It introduces an Image Structural Representation and a diffusion-based data synthesis pipeline to produce training pairs with high structural similarity but low semantic similarity, enabling robust learning via MoCo-style contrastive learning. Two manually annotated test sets, SIA (synthetic) and SIR (real), are constructed to evaluate structural infringement detection, and a ViT-L backbone with LoRA is fine-tuned for this purpose. Experimental results show state-of-the-art retrieval performance on both datasets, highlighting the method's potential to protect creators' rights in AI-generated content and guiding future research on structure-focused image understanding.

Abstract

Structural information in images is crucial for aesthetic assessment, and it is widely recognized in the artistic field that imitating the structure of other works significantly infringes on creators' rights. The advancement of diffusion models has led to AI-generated content imitating artists' structural creations, yet effective detection methods are still lacking. In this paper, we define this phenomenon as "structural infringement" and propose a corresponding detection method. Additionally, we develop quantitative metrics and create manually annotated datasets for evaluation: the SIA dataset of synthesized data, and the SIR dataset of real data. Due to the current lack of datasets for structural infringement detection, we propose a new data synthesis strategy based on diffusion models and LLM, successfully training a structural infringement detection model. Experimental results show that our method can successfully detect structural infringements and achieve notable improvements on annotated test sets.

Semantic to Structure: Learning Structural Representations for Infringement Detection

TL;DR

This work tackles structural infringement in image generation by focusing on the structure of an image rather than its content. It introduces an Image Structural Representation and a diffusion-based data synthesis pipeline to produce training pairs with high structural similarity but low semantic similarity, enabling robust learning via MoCo-style contrastive learning. Two manually annotated test sets, SIA (synthetic) and SIR (real), are constructed to evaluate structural infringement detection, and a ViT-L backbone with LoRA is fine-tuned for this purpose. Experimental results show state-of-the-art retrieval performance on both datasets, highlighting the method's potential to protect creators' rights in AI-generated content and guiding future research on structure-focused image understanding.

Abstract

Structural information in images is crucial for aesthetic assessment, and it is widely recognized in the artistic field that imitating the structure of other works significantly infringes on creators' rights. The advancement of diffusion models has led to AI-generated content imitating artists' structural creations, yet effective detection methods are still lacking. In this paper, we define this phenomenon as "structural infringement" and propose a corresponding detection method. Additionally, we develop quantitative metrics and create manually annotated datasets for evaluation: the SIA dataset of synthesized data, and the SIR dataset of real data. Due to the current lack of datasets for structural infringement detection, we propose a new data synthesis strategy based on diffusion models and LLM, successfully training a structural infringement detection model. Experimental results show that our method can successfully detect structural infringements and achieve notable improvements on annotated test sets.

Paper Structure

This paper contains 9 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Structural infringement image pairs in SIR and SIA datasets. (a): The SIR dataset encompasses image pairs that exhibit structural infringement in the real world. (b) The SIA dataset includes image pairs with structural infringement generated by diffusion models, with real images on the left and synthetic images on the right. Despite the low content similarity, these pairs exhibit high structural similarity, indicating potential structural infringement.
  • Figure 2: Data synthesis pipeline. (a) Given a source image with a caption description, the depth map is first extracted using DPT to capture the main structural information. Subsequently, the LLM is used to modify the attributes of the main objects in the caption to change the semantic information. (b) The text and image condition are then input into SDXL and ControlNet respectively, generating images with high structural similarity but low semantic similarity to the source image.
  • Figure 3: Top-1 retrieval image on SIA datasets using DINO and our proposed image structural representation. For each pair, the left image is the query, and the right image is the retrieval result. The cosine similarity scores for each pair are shown below.