Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation
AprilPyone MaungMaung, Huy H. Nguyen, Hitoshi Kiya, Isao Echizen
TL;DR
The paper addresses the challenge of efficiently generating spurious features for robust classifier evaluation by fine-tuning a large-scale text-to-image diffusion model (Stable Diffusion) using a small set of reference spurious images and a novel spurious feature similarity loss. It extends DreamBooth with joint text-encoder and noise-predictor optimization and introduces the spurious feature similarity loss $\mathcal{L}_{\text{(SFSL)}}$ to steer generative outputs toward class-wise spurious cues, combining it with a prior preservation term. Experiments on six Spurious ImageNet classes show that the generated images are spurious across multiple classifiers and visually resemble reference spurious images, outperforming or complementing existing Spurious ImageNet data in spurious evaluation. The approach provides a scalable, controllable way to produce synthetic, cross-class spurious data, with practical implications for testing and training robust classifiers, albeit with artifacts and context-dependent limitations acknowledged in the discussion.
Abstract
We propose a method for generating spurious features by leveraging large-scale text-to-image diffusion models. Although the previous work detects spurious features in a large-scale dataset like ImageNet and introduces Spurious ImageNet, we found that not all spurious images are spurious across different classifiers. Although spurious images help measure the reliance of a classifier, filtering many images from the Internet to find more spurious features is time-consuming. To this end, we utilize an existing approach of personalizing large-scale text-to-image diffusion models with available discovered spurious images and propose a new spurious feature similarity loss based on neural features of an adversarially robust model. Precisely, we fine-tune Stable Diffusion with several reference images from Spurious ImageNet with a modified objective incorporating the proposed spurious-feature similarity loss. Experiment results show that our method can generate spurious images that are consistently spurious across different classifiers. Moreover, the generated spurious images are visually similar to reference images from Spurious ImageNet.
