Table of Contents
Fetching ...

SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models

Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Liangming Xia, Yijie Bai, Haiqin Weng, Wenyuan Xu

TL;DR

The paper tackles the risk that powerful pre-trained models can be misused by enabling targeted fine-tuning in restricted domains. It proposes non-fine-tunable learning via SOPHON, which alternates between simulating adversarial fine-tuning and reinforcing original-domain performance, guided by MAML-inspired ideas. SOPHON introduces specialized suppression losses (ICE, KLU, and DoS) to ensure stable convergence and broad robustness across classification and generation tasks, architectures, optimizers, and hyperparameters. Empirical results show that SOPHON significantly elevates the cost of restricted-domain fine-tuning—often matching or exceeding training from scratch—while preserving high performance on benign tasks, highlighting its potential for safer, responsible AI deployment.

Abstract

Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains. Nonetheless, this is challenging due to a diversity of complicated fine-tuning strategies that may be adopted by adversaries. Inspired by model-agnostic meta-learning, we overcome this difficulty by designing sophisticated fine-tuning simulation and fine-tuning evaluation algorithms. In addition, we carefully design the optimization process to entrap the pre-trained model within a hard-to-escape local optimum regarding restricted domains. We have conducted extensive experiments on two deep learning modes (classification and generation), seven restricted domains, and six model architectures to verify the effectiveness of SOPHON. Experiment results verify that fine-tuning SOPHON-protected models incurs an overhead comparable to or even greater than training from scratch. Furthermore, we confirm the robustness of SOPHON to three fine-tuning methods, five optimizers, various learning rates and batch sizes. SOPHON may help boost further investigations into safe and responsible AI.

SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models

TL;DR

The paper tackles the risk that powerful pre-trained models can be misused by enabling targeted fine-tuning in restricted domains. It proposes non-fine-tunable learning via SOPHON, which alternates between simulating adversarial fine-tuning and reinforcing original-domain performance, guided by MAML-inspired ideas. SOPHON introduces specialized suppression losses (ICE, KLU, and DoS) to ensure stable convergence and broad robustness across classification and generation tasks, architectures, optimizers, and hyperparameters. Empirical results show that SOPHON significantly elevates the cost of restricted-domain fine-tuning—often matching or exceeding training from scratch—while preserving high performance on benign tasks, highlighting its potential for safer, responsible AI deployment.

Abstract

Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains. Nonetheless, this is challenging due to a diversity of complicated fine-tuning strategies that may be adopted by adversaries. Inspired by model-agnostic meta-learning, we overcome this difficulty by designing sophisticated fine-tuning simulation and fine-tuning evaluation algorithms. In addition, we carefully design the optimization process to entrap the pre-trained model within a hard-to-escape local optimum regarding restricted domains. We have conducted extensive experiments on two deep learning modes (classification and generation), seven restricted domains, and six model architectures to verify the effectiveness of SOPHON. Experiment results verify that fine-tuning SOPHON-protected models incurs an overhead comparable to or even greater than training from scratch. Furthermore, we confirm the robustness of SOPHON to three fine-tuning methods, five optimizers, various learning rates and batch sizes. SOPHON may help boost further investigations into safe and responsible AI.
Paper Structure (44 sections, 23 equations, 14 figures, 7 tables, 1 algorithm)

This paper contains 44 sections, 23 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: The objectives of non-fine-tunable learning. (1) Intactness: it should preserve the model performance in the original domain. (2) Non-fine-tunability: fine-tuning the model in the restricted domain should incur a comparable or even greater overhead than training the model from scratch.
  • Figure 2: Design of Sophon. Sophon mainly consists of two alternating phases, i.e., the fine-tuning suppression (FTS) loops and the normal training reinforcement (NTR) loops. The FTS loops are designed to simulate different fine-tuning processes and degrade the fine-tuning performance in the restricted domain. The NTR loops are designed to maintain the performance in the original domain. The number of tasks $N$, the number of updates $K$, the learning rates of FTS loops $\alpha$ and NTR loops $\beta$, and the number of FTS loops $\ell_\mathrm{FTS}$ and NTR loops $\ell_\mathrm{NTR}$, and the total number of iterations $\mathrm{Iter}$ are hyper-parameters.
  • Figure 3: Effectiveness of Sophon compared with three baselines. Fine-tuning the original or the NTL model can achieve a high accuracy, leading to model misuse. All of the three methods of fine-tuning the Sophon model yield poorer performances than training the model from scratch.
  • Figure 4: Effectiveness of Sophon compared with two baselines. Fine-tuning the original model can achieve low losses, leading to model misuse. Fine-tuning the Sophon model yields higher losses than training the model from scratch.
  • Figure 5: Effectiveness of Sophon compared with two baselines. B1 and B2 both perform well in the restricted domain (CelebA) in terms of the denoising ability. The Sophon model cannot denoise images from the restricted domain, thus is protected.
  • ...and 9 more figures