Table of Contents
Fetching ...

All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning

Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, Xi Li

TL;DR

This work targets the generalization challenge in detecting AI-generated images (AIGIs) by proposing Panoptic Patch Learning (PPL), which enforces use of information from all image patches. The framework combines Random Patch Replacement (RPR) to disrupt reliance on dominant patches and Patch-wise Contrastive Learning (PCL) to align patch representations across the image, promoting uniform patch utilization. The authors formalize principles of All Patches Matter and More Patches Better, diagnose Few-Patch Bias via Total Direct Effect (TDE) analyses, and demonstrate state-of-the-art performance across GenImage, DRCT, and Chameleon, with strong robustness to corruptions and masking. The approach offers practical gains for cross-generator generalization, enabling more reliable AIGI detection in the rapidly evolving landscape of generative models. Key ideas include leveraging distributed patch artifacts, mitigating lazy learning, and optimizing a combined loss that preserves discriminative power across all patches $\mathcal{L}_{total} = \lambda \mathcal{L}_{con} + (1-\lambda) \mathcal{L}_{ce}$, where $\mathcal{L}_{con}$ is a margin-based patch-wise contrastive loss and $TDE$ analyses reveal per-patch contributions to detection decisions.

Abstract

The exponential growth of AI-generated images (AIGIs) underscores the urgent need for robust and generalizable detection methods. In this paper, we establish two key principles for AIGI detection through systematic analysis: (1) All Patches Matter: Unlike conventional image classification where discriminative features concentrate on object-centric regions, each patch in AIGIs inherently contains synthetic artifacts due to the uniform generation process, suggesting that every patch serves as an important artifact source for detection. (2) More Patches Better: Leveraging distributed artifacts across more patches improves detection robustness by capturing complementary forensic evidence and reducing over-reliance on specific patches, thereby enhancing robustness and generalization. However, our counterfactual analysis reveals an undesirable phenomenon: naively trained detectors often exhibit a Few-Patch Bias, discriminating between real and synthetic images based on minority patches. We identify Lazy Learner as the root cause: detectors preferentially learn conspicuous artifacts in limited patches while neglecting broader artifact distributions. To address this bias, we propose the Panoptic Patch Learning (PPL) framework, involving: (1) Random Patch Replacement that randomly substitutes synthetic patches with real counterparts to compel models to identify artifacts in underutilized regions, encouraging the broader use of more patches; (2) Patch-wise Contrastive Learning that enforces consistent discriminative capability across all patches, ensuring uniform utilization of all patches. Extensive experiments across two different settings on several benchmarks verify the effectiveness of our approach.

All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning

TL;DR

This work targets the generalization challenge in detecting AI-generated images (AIGIs) by proposing Panoptic Patch Learning (PPL), which enforces use of information from all image patches. The framework combines Random Patch Replacement (RPR) to disrupt reliance on dominant patches and Patch-wise Contrastive Learning (PCL) to align patch representations across the image, promoting uniform patch utilization. The authors formalize principles of All Patches Matter and More Patches Better, diagnose Few-Patch Bias via Total Direct Effect (TDE) analyses, and demonstrate state-of-the-art performance across GenImage, DRCT, and Chameleon, with strong robustness to corruptions and masking. The approach offers practical gains for cross-generator generalization, enabling more reliable AIGI detection in the rapidly evolving landscape of generative models. Key ideas include leveraging distributed patch artifacts, mitigating lazy learning, and optimizing a combined loss that preserves discriminative power across all patches , where is a margin-based patch-wise contrastive loss and analyses reveal per-patch contributions to detection decisions.

Abstract

The exponential growth of AI-generated images (AIGIs) underscores the urgent need for robust and generalizable detection methods. In this paper, we establish two key principles for AIGI detection through systematic analysis: (1) All Patches Matter: Unlike conventional image classification where discriminative features concentrate on object-centric regions, each patch in AIGIs inherently contains synthetic artifacts due to the uniform generation process, suggesting that every patch serves as an important artifact source for detection. (2) More Patches Better: Leveraging distributed artifacts across more patches improves detection robustness by capturing complementary forensic evidence and reducing over-reliance on specific patches, thereby enhancing robustness and generalization. However, our counterfactual analysis reveals an undesirable phenomenon: naively trained detectors often exhibit a Few-Patch Bias, discriminating between real and synthetic images based on minority patches. We identify Lazy Learner as the root cause: detectors preferentially learn conspicuous artifacts in limited patches while neglecting broader artifact distributions. To address this bias, we propose the Panoptic Patch Learning (PPL) framework, involving: (1) Random Patch Replacement that randomly substitutes synthetic patches with real counterparts to compel models to identify artifacts in underutilized regions, encouraging the broader use of more patches; (2) Patch-wise Contrastive Learning that enforces consistent discriminative capability across all patches, ensuring uniform utilization of all patches. Extensive experiments across two different settings on several benchmarks verify the effectiveness of our approach.

Paper Structure

This paper contains 26 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a): The model trained with PPL exhibits a more uniform distribution of attention across nearly all patches, suggesting that PPL facilitates comprehensive artifact detection. (b): We compare PPL with other methods under two evaluation settings: Setting-I (GenImage dataset zhu2024genimage), where the model is trained on images from a specific generative model and tested on synthetic images from various generative models; and Setting-II (Chameleon dataset yan2024sanity), where the model is trained on a diverse set of generative models and evaluated on human-imperceptible synthetic images. For further details, see Section \ref{['sec:exper']}.
  • Figure 2: Visualization of different patch-wise artifacts generated by AI models is depicted by comparing real images to their synthetic counterparts reconstructed by diffusion models. We observe various patch-level synthetic traces, such as broken lines, unnatural noise, and lost detail on clear boundaries, indicating a diversity of artifacts among different patches. This observation supports the need for leveraging more patches to enhance the recognition capability for different artifacts.
  • Figure 3: (a) Attention maps reveal the "few patches bias" in naively trained ViT detection models, where attention focuses on a few dominant patches, indicating over-reliance on limited regions. (b) Recall degradation in a natively trained model is shown by occluding single patches of varying sizes. Such models are sensitive to specific patches, causing notable recall drops.
  • Figure 4: TDE heatmap of existing methods on generated images selected from the DRCT dataset. A broader and more uniform highlighted region indicates a greater number of patches contributing to determining a fake image. The results of UnivFD ojha2023towards, DRCT chen2024drct, and Breaking zheng2024break are obtained from our implementation.
  • Figure 5: The Panoptic Patch Learning (PPL) framework embodies the principles of All Patches Matter and More Patches Better through two key components: Random Patch Replacement (RPR) and Patch-wise Contrastive Learning (PCL). During training, the model may rely excessively on the dominant patch, neglecting others. RPR mitigates this by randomly replacing dominant patches with real ones, prompting the model to detect artifacts in non-dominant patches and thus expanding dominant regions. PCL further promotes balanced patch utilization by aligning embeddings of patches with same labels. Together, RPR and PCL foster comprehensive and uniform patch exploitation.
  • ...and 1 more figures