OrderBkd: Textual backdoor attack through repositioning

Irina Alekseevskaia; Konstantin Arkhipenko

OrderBkd: Textual backdoor attack through repositioning

Irina Alekseevskaia, Konstantin Arkhipenko

TL;DR

OrderBkd demonstrates a novel textual backdoor by repositioning a single token, guided by POS-based word selection, to trigger misclassification with minimal semantic disruption. The approach uses adverbs (or determiners) as re-positioning candidates and selects new positions to minimize perplexity via GPT-2, preserving USE similarity and maintaining high attack success across SST-2 and AG with diverse victim models. It presents a formal threat model, integrates a joint poisoning-training objective, and shows robustness to the ONION defense, highlighting a security risk from simple, content-preserving triggers. The work provides 3–5 sentence high-level takeaways and motivates development of targeted defenses against order-based backdoors in NLP.

Abstract

The use of third-party datasets and pre-trained machine learning models poses a threat to NLP systems due to possibility of hidden backdoor attacks. Existing attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing, which either alter the semantics of the original texts or can be detected. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger. By designing and applying specific part-of-speech (POS) based rules for selecting these tokens, we maintain high attack success rate on SST-2 and AG classification datasets while outperforming existing attacks in terms of perplexity and semantic similarity to the clean samples. In addition, we show the robustness of our attack to the ONION defense method. All the code and data for the paper can be obtained at https://github.com/alekseevskaia/OrderBkd.

OrderBkd: Textual backdoor attack through repositioning

TL;DR

Abstract

Paper Structure (20 sections, 5 equations, 2 figures, 5 tables)

This paper contains 20 sections, 5 equations, 2 figures, 5 tables.

Introduction
Related work
Methodology
Problem formulation
OrderBkd
Candidates for re-positioning
Choosing the new positions
Training
Experiments
Experimental settings
Datasets
Victim models
POS tagger
Metrics
Baselines
...and 5 more sections

Figures (2)

Figure 1: SST-2 sample poisoned by various methods (including ours) and the corresponding Universal Sentence Encoder similarity values shown on the right. Examples of textual backdoor attacks, where backdoor triggers are highlighted in red.
Figure 2: The scheme of OrderBkd attack. At the stage (i), a fraction of the training samples are poisoned by changing the position of an adverb or determiner in each sample. The stage (ii) is further training on the victim's side leading to a backdoor.

OrderBkd: Textual backdoor attack through repositioning

TL;DR

Abstract

OrderBkd: Textual backdoor attack through repositioning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)