Table of Contents
Fetching ...

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Zikang Ding, Haomiao Yang, Meng Hao, Wenbo Jiang, Kunlan Xiang, Runmeng Du, Yijing Liu, Ruichen Zhang, Dusit Niyato

Abstract

Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Abstract

Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy (94\%), and achieves near-perfect post-activation attack success rates (99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.
Paper Structure (41 sections, 19 equations, 5 figures, 7 tables)

This paper contains 41 sections, 19 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: A comparison between traditional backdoor attacks and delayed backdoor attacks: traditional backdoor attacks seek to execute malicious behavior immediately upon encountering the trigger, whereas delayed backdoor attacks require both the presence of the trigger and the satisfaction of delay control conditions to activate the attack.
  • Figure 2: Overview of the proposed delayed backdoor (DND). $O$ denotes the cumulative count of trigger combinations and $\widehat{T}(O)$ denotes the decay-controlled activation period. Blue paths indicate the latency mode; pink paths indicate the outbreak mode.
  • Figure 3: To compare the trigger attack curves of different methods across four datasets, we visualize the moments when the trigger attacks occur. This allows for the observation of the immediacy and latency in backdoor behavior across different methods.
  • Figure 4: Attack performance vs. poisoning rate on SST-2. DND maintains near-perfect ASR across all ratios, surpassing baselines even with minimal poisoning.
  • Figure 5: Hyperparameter sensitivity analysis. Left: $O^{*}$ vs. $(b,c)$, showing larger $b$ or $c$ shortens latency. Right: $O^{*}$ vs. $(a,c)$, showing higher $a$ or lower $c$ extends latency.