Table of Contents
Fetching ...

AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era

Chenyang Zhu, Xing Zhang, Yuyang Sun, Ching-Chun Chang, Isao Echizen

TL;DR

AnimeDL-2M tackles the lack of anime-focused IMDL benchmarks by introducing a million-scale dataset with real, edited, and AI-generated anime images and rich annotations. It proposes AniXplore, a domain-tailored IMDL model that fuses texture- and semantics-based features to improve localization and detection in anime imagery. Across extensive experiments, AniXplore outperforms six SOTA methods and demonstrates strong generalization in detection, with ablations highlighting the value of frequency features, dual-branch fusion, and adaptive loss balancing. The dataset and model jointly provide a practical resource for copyright protection and content moderation in AI-generated anime content, and they set the stage for future domain-specific research in AI-forensics of stylized media.

Abstract

Recent advances in image generation, particularly diffusion models, have significantly lowered the barrier for creating sophisticated forgeries, making image manipulation detection and localization (IMDL) increasingly challenging. While prior work in IMDL has focused largely on natural images, the anime domain remains underexplored-despite its growing vulnerability to AI-generated forgeries. Misrepresentations of AI-generated images as hand-drawn artwork, copyright violations, and inappropriate content modifications pose serious threats to the anime community and industry. To address this gap, we propose AnimeDL-2M, the first large-scale benchmark for anime IMDL with comprehensive annotations. It comprises over two million images including real, partially manipulated, and fully AI-generated samples. Experiments indicate that models trained on existing IMDL datasets of natural images perform poorly when applied to anime images, highlighting a clear domain gap between anime and natural images. To better handle IMDL tasks in anime domain, we further propose AniXplore, a novel model tailored to the visual characteristics of anime imagery. Extensive evaluations demonstrate that AniXplore achieves superior performance compared to existing methods. Dataset and code can be found in https://flytweety.github.io/AnimeDL2M/.

AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era

TL;DR

AnimeDL-2M tackles the lack of anime-focused IMDL benchmarks by introducing a million-scale dataset with real, edited, and AI-generated anime images and rich annotations. It proposes AniXplore, a domain-tailored IMDL model that fuses texture- and semantics-based features to improve localization and detection in anime imagery. Across extensive experiments, AniXplore outperforms six SOTA methods and demonstrates strong generalization in detection, with ablations highlighting the value of frequency features, dual-branch fusion, and adaptive loss balancing. The dataset and model jointly provide a practical resource for copyright protection and content moderation in AI-generated anime content, and they set the stage for future domain-specific research in AI-forensics of stylized media.

Abstract

Recent advances in image generation, particularly diffusion models, have significantly lowered the barrier for creating sophisticated forgeries, making image manipulation detection and localization (IMDL) increasingly challenging. While prior work in IMDL has focused largely on natural images, the anime domain remains underexplored-despite its growing vulnerability to AI-generated forgeries. Misrepresentations of AI-generated images as hand-drawn artwork, copyright violations, and inappropriate content modifications pose serious threats to the anime community and industry. To address this gap, we propose AnimeDL-2M, the first large-scale benchmark for anime IMDL with comprehensive annotations. It comprises over two million images including real, partially manipulated, and fully AI-generated samples. Experiments indicate that models trained on existing IMDL datasets of natural images perform poorly when applied to anime images, highlighting a clear domain gap between anime and natural images. To better handle IMDL tasks in anime domain, we further propose AniXplore, a novel model tailored to the visual characteristics of anime imagery. Extensive evaluations demonstrate that AniXplore achieves superior performance compared to existing methods. Dataset and code can be found in https://flytweety.github.io/AnimeDL2M/.

Paper Structure

This paper contains 29 sections, 5 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: An overview of AnimeDL-2M's data construction pipeline and data example. Image perception component reads the image and outputs image caption as well as objects found in the image. Image segmentation component randomly picks one object and generates its mask for each image. Image generation component uses inpainting and text-to-image methods with 6 different models to create 6 fake images for each raw image. Captions, objects, mask labels, and editing methods serve as extra annotations.
  • Figure 2: Aesthetic distribution of real and synthetic anime images. Note that inpainted images have a similar distribution to real images.
  • Figure 3: Top30 subject distribution of AnimeDL2M dataset. It exhibits a diverse range of subjects which highlights the open-world nature of the dataset, making it suitable for training robust and generalized IMDL models.
  • Figure 4: Overview of AniXPlore, which consists of Mixed Feature Extractor, Dual-Perception Encoder, and Localization and Classification Predictor, using information from both local textures and global semantics for anime IMDL.