Table of Contents
Fetching ...

DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection

Jiazhen Yan, Ziqiang Li, Fan Wang, Boyu Wang, Zhangjie Fu

TL;DR

The paper tackles the problem of catastrophic forgetting during CLIP fine-tuning for AI-generated image detection by introducing DGS-Net, a gradient-space approach that disentangles transferable priors from task-irrelevant cues. It combines Orthogonal Suppression, which projects away harmful gradient components, with Prior Alignment, which injects beneficial priors from a frozen CLIP encoder, preserving pre-trained geometry while learning artifact-level features. Empirical results across 50 generative models and multiple benchmarks show substantial gains in both accuracy and average precision, with strong cross-generator generalization and robustness to perturbations. This work advances reliable AI-generated image detection by ensuring effective artifact learning without eroding the useful priors learned from large-scale pre-training.

Abstract

The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.

DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection

TL;DR

The paper tackles the problem of catastrophic forgetting during CLIP fine-tuning for AI-generated image detection by introducing DGS-Net, a gradient-space approach that disentangles transferable priors from task-irrelevant cues. It combines Orthogonal Suppression, which projects away harmful gradient components, with Prior Alignment, which injects beneficial priors from a frozen CLIP encoder, preserving pre-trained geometry while learning artifact-level features. Empirical results across 50 generative models and multiple benchmarks show substantial gains in both accuracy and average precision, with strong cross-generator generalization and robustness to perturbations. This work advances reliable AI-generated image detection by ensuring effective artifact learning without eroding the useful priors learned from large-scale pre-training.

Abstract

The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.

Paper Structure

This paper contains 20 sections, 17 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: T-SNE Visualization of Features Extracted Using CLIP, CLIP-LoRA and Ours. Our method achieves strong real/fake discrimination while simultaneously preserving the prior knowledge embedded in the pre-trained model.
  • Figure 2: Overview of the proposed Distillation-guided Gradient Surgery Network (DGS-Net). We introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. What's more, it consists of two core components: Orthogonal Suppression and Prior Alignment, which aim to suppress task-irrelevant representations and preserve transferable priors established during large-scale pre-training, substantially enhancing the generalization performance of AI-generated image detection.
  • Figure 3: Classification using only text descriptions achieves an accuracy of approximately 60%. We used BLIP to convert images into textual descriptions and trained the detector directly on these text inputs. The resulting detection accuracy fluctuated around 60%, indicating that semantic information indeed carries label-related signals. However, most of them act as distractors, reflecting dataset-specific shortcuts that hinder cross-generator generalization.