What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale
Xiaoyong Yuan, Xiaolong Ma, Linke Guo, Lan Zhang
TL;DR
This work addresses the need for practical pre-deployment auditing of fine-tuned diffusion models by introducing PAIA, a model-centric framework that avoids brittle prompt-based probes and costly image-based detectors. By analyzing late-stage denoising behavior (prompt-agnostic) and using a calibrated, image-free metric that compares fine-tuned and base models (conditional calibrated error), PAIA achieves high accuracy and significant efficiency gains on both controlled and real-world models. It is robust to adaptive attacks and generalizes to rare concepts, offering a scalable approach for safer diffusion-model sharing on public hubs. The methodology contributes to safer, more transparent deployment of generative models and lays groundwork for broader auditing and safety tooling in diffusion-based AI systems.
Abstract
Diffusion models (DMs) have revolutionized text-to-image generation, enabling the creation of highly realistic and customized images from text prompts. With the rise of parameter-efficient fine-tuning (PEFT) techniques, users can now customize powerful pre-trained models using minimal computational resources. However, the widespread sharing of fine-tuned DMs on open platforms raises growing ethical and legal concerns, as these models may inadvertently or deliberately generate sensitive or unauthorized content. Despite increasing regulatory attention on generative AI, there are currently no practical tools for systematically auditing these models before deployment. In this paper, we address the problem of concept auditing: determining whether a fine-tuned DM has learned to generate a specific target concept. Existing approaches typically rely on prompt-based input crafting and output-based image classification but they suffer from critical limitations, including prompt uncertainty, concept drift, and poor scalability. To overcome these challenges, we introduce Prompt-Agnostic Image-Free Auditing (PAIA), a novel, model-centric concept auditing framework. By treating the DM as the object of inspection, PAIA enables direct analysis of internal model behavior, bypassing the need for optimized prompts or generated images. We evaluate PAIA on 320 controlled models trained with curated concept datasets and 771 real-world community models sourced from a public DM sharing platform. Evaluation results show that PAIA achieves over 90% detection accuracy while reducing auditing time by 18 - 40X compared to existing baselines. To our knowledge, PAIA is the first scalable and practical solution for pre-deployment concept auditing of diffusion models, providing a practical foundation for safer and more transparent diffusion model sharing.
