Semantic to Structure: Learning Structural Representations for Infringement Detection
Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang, Jie Zhou
TL;DR
This work tackles structural infringement in image generation by focusing on the structure of an image rather than its content. It introduces an Image Structural Representation and a diffusion-based data synthesis pipeline to produce training pairs with high structural similarity but low semantic similarity, enabling robust learning via MoCo-style contrastive learning. Two manually annotated test sets, SIA (synthetic) and SIR (real), are constructed to evaluate structural infringement detection, and a ViT-L backbone with LoRA is fine-tuned for this purpose. Experimental results show state-of-the-art retrieval performance on both datasets, highlighting the method's potential to protect creators' rights in AI-generated content and guiding future research on structure-focused image understanding.
Abstract
Structural information in images is crucial for aesthetic assessment, and it is widely recognized in the artistic field that imitating the structure of other works significantly infringes on creators' rights. The advancement of diffusion models has led to AI-generated content imitating artists' structural creations, yet effective detection methods are still lacking. In this paper, we define this phenomenon as "structural infringement" and propose a corresponding detection method. Additionally, we develop quantitative metrics and create manually annotated datasets for evaluation: the SIA dataset of synthesized data, and the SIR dataset of real data. Due to the current lack of datasets for structural infringement detection, we propose a new data synthesis strategy based on diffusion models and LLM, successfully training a structural infringement detection model. Experimental results show that our method can successfully detect structural infringements and achieve notable improvements on annotated test sets.
