Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features

Hanbyul Lee; Juneho Yi

Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features

Hanbyul Lee, Juneho Yi

TL;DR

This work tackles the rapid emergence of unseen generative models by reframing model attribution (MA) as a few-shot class-incremental learning problem. It introduces a learnable, multi-level CLIP-ViT representation via Adaptive Integration Module (AIM) that assigns per-image weights to block features for accurate model attribution and integrates it within a TEEN-based FSCIL framework. The approach is validated on 28 generators, with a base session of GANs and incremental sessions incorporating diffusion-based models, showing strong and scalable attribution across model evolution and CLIP backbones. The results demonstrate that leveraging low-level information is crucial for MA, and combining information across all levels with AIM yields the best performance, enabling rapid adaptation to newly released generators with minimal data.

Abstract

Recently, images that distort or fabricate facts using generative models have become a social concern. To cope with continuous evolution of generative artificial intelligence (AI) models, model attribution (MA) is necessary beyond just detection of synthetic images. However, current deep learning-based MA methods must be trained from scratch with new data to recognize unseen models, which is time-consuming and data-intensive. This work proposes a new strategy to deal with persistently emerging generative models. We adapt few-shot class-incremental learning (FSCIL) mechanisms for MA problem to uncover novel generative AI models. Unlike existing FSCIL approaches that focus on object classification using high-level information, MA requires analyzing low-level details like color and texture in synthetic images. Thus, we utilize a learnable representation from different levels of CLIP-ViT features. To learn an effective representation, we propose Adaptive Integration Module (AIM) to calculate a weighted sum of CLIP-ViT block features for each image, enhancing the ability to identify generative models. Extensive experiments show our method effectively extends from prior generative models to recent ones.

Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features

TL;DR

Abstract

Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)