Learning to Obstruct Few-Shot Image Classification over Restricted Classes
Amber Yijia Zheng, Chiao-An Yang, Raymond A. Yeh
TL;DR
This paper tackles the security risk of openly released pre-trained backbones by asking whether a model can be meta-learned to obstruct fine-tuning for a subset of downstream tasks. It introduces Learning to Obstruct (LTO), a gradient-based, MAML-like framework that learns a poor initialization for a backbone with respect to restricted classes in few-shot classification, while preserving performance on non-restricted classes. The authors demonstrate LTO’s effectiveness across classical FSC methods (ProtoNet, MetaOptNet), CLIP-based FSC (CoOp, Tip-Adapter, CE), and CLIP-based attribute learning on ImageNet, CIFAR100, and CelebA, using a consistent obstruction metric DropRatio@β and careful dataset splits. The results show that LTO can significantly degrade restricted-class performance with minimal collateral damage to other classes, supporting a safety-oriented approach to releasing open-source models and motivating further exploration of obstruction-aware training. Overall, LTO represents a promising step toward safer open backbones by preemptively hindering unwanted fine-tuning while maintaining broad utility for legitimate tasks.
Abstract
Advancements in open-source pre-trained backbones make it relatively easy to fine-tune a model for new tasks. However, this lowered entry barrier poses potential risks, e.g., bad actors developing models for harmful applications. A question arises: Is possible to develop a pre-trained model that is difficult to fine-tune for certain downstream tasks? To begin studying this, we focus on few-shot classification (FSC). Specifically, we investigate methods to make FSC more challenging for a set of restricted classes while maintaining the performance of other classes. We propose to meta-learn over the pre-trained backbone in a manner that renders it a ''poor initialization''. Our proposed Learning to Obstruct (LTO) algorithm successfully obstructs four FSC methods across three datasets, including ImageNet and CIFAR100 for image classification, as well as CelebA for attribute classification.
