Revocable Backdoor for Deep Model Trading

Yiran Xu; Nan Zhong; Zhenxing Qian; Xinpeng Zhang

Revocable Backdoor for Deep Model Trading

Yiran Xu, Nan Zhong, Zhenxing Qian, Xinpeng Zhang

TL;DR

The paper addresses the risk of backdoors in deployed deep models by reframing backdoors as a controllable asset within a model-trading workflow. It introduces a revocable backdoor mechanism based on trainable mask matrices that gate backdoor behavior at interior feature maps, while a coordinated trigger and loss design preserves clean-task fidelity. The key contributions include a practical withdrawal method via masks, trigger-fine-tuning to balance imperceptibility and robustness, and empirical validation across multiple datasets and architectures, showing feasibility and resilience against purification defenses. This approach offers a novel, risk-managed path for exchanging deep models as tradable digital products, with a built-in detoxification pathway upon final payment.

Abstract

Deep models are being applied in numerous fields and have become a new important digital product. Meanwhile, previous studies have shown that deep models are vulnerable to backdoor attacks, in which compromised models return attacker-desired results when a trigger appears. Backdoor attacks severely break the trust-worthiness of deep models. In this paper, we turn this weakness of deep models into a strength, and propose a novel revocable backdoor and deep model trading scenario. Specifically, we aim to compromise deep models without degrading their performance, meanwhile, we can easily detoxify poisoned models without re-training the models. We design specific mask matrices to manage the internal feature maps of the models. These mask matrices can be used to deactivate the backdoors. The revocable backdoor can be adopted in the deep model trading scenario. Sellers train models with revocable backdoors as a trial version. Buyers pay a deposit to sellers and obtain a trial version of the deep model. If buyers are satisfied with the trial version, they pay a final payment to sellers and sellers send mask matrices to buyers to withdraw revocable backdoors. We demonstrate the feasibility and robustness of our revocable backdoor by various datasets and network architectures.

Revocable Backdoor for Deep Model Trading

TL;DR

Abstract

Paper Structure (15 sections, 8 equations, 3 figures, 3 tables)

This paper contains 15 sections, 8 equations, 3 figures, 3 tables.

Model trading Scenario
Related Work
Backdoor Attack
Backdoor Defence
Revocable Backdoor
Preliminaries
Method
Trigger Fine-tuning
Backdoor Erasing for Existing Attacks
Experimental Results
Experimental Setup
Attack Effectiveness
Attack Robustness
Ablation Studies
Conclusions

Figures (3)

Figure 1: The illustration of the revocable backdoor for the model trading scenario. (a) Seller trains a model with a revocable backdoor as a trial version for buyers. The performance of the trial version over clean inputs is the same as the final model (or a clean model without backdoors). (b) Buyers pay a deposit and obtain a trial model. Then, buyers evaluate whether the model meets their requirements. For trial models (backdoored models), the trigger predefined by sellers in (a) can lead models to return wrong results. Note that we use a pure small black square to represent the trigger pattern in the figure for simplicity. The practical trigger pattern adopted in our approach is more sophisticated. (c) if buyers decide to pay the final payment, sellers withdraw the hidden backdoors. The deep model returns the correct result even the trigger pattern appears.
Figure 2: The framework of the implementation revocable backdoor attack. Our approach consists of two main parts. (a) is similar to common backdoor attacks. The training set including both clean and poisonous inputs is fed into the classifier, which returns correct results for clean inputs yet wrong results for poisonous ones. (b) is the crux of the revocability of our backdoor. We withdraw our backdoor by controlling the interior feature map. We utilize some trainable mask matrices to intentionally break the poisonous inference link.
Figure 3: The visualization results of the trigger (Sub-ImgeNet). The first line denotes clean and poisonous images. The second line denotes the residual between clean and poisonous images. The visualization results of CIFAR-10 and GTSRB can be found in Figure 1 of the supplementary material.

Revocable Backdoor for Deep Model Trading

TL;DR

Abstract

Revocable Backdoor for Deep Model Trading

Authors

TL;DR

Abstract

Table of Contents

Figures (3)