Table of Contents
Fetching ...

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

Giorgio Severi, Simona Boboila, John Holodnak, Kendra Kratkiewicz, Rauf Izmailov, Michael J. De Lucia, Alina Oprea

TL;DR

This paper tackles clean-label backdoor attacks in cybersecurity contexts by introducing a model-agnostic defense that operates without clean data or knowledge of the victim model. The method combines dimensionality reduction, density-based clustering (OPTICS), and an iterative cluster-scoring procedure to identify and sanitize poisoned data while maintaining high model utility. It demonstrates substantial reductions in attack success (up to 90%) across two data modalities—network traffic and malware—using both gradient-boosted trees and neural networks, with options to filter or patch suspicious clusters. The approach is practical for security deployments and generalizes across model types and data modalities, making it a versatile defense against stealthy training-time poisoning.

Abstract

The training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature. To show the generality of our proposed mitigation, we evaluate it on two clean-label model-agnostic attacks on two different classic cybersecurity data modalities: network flows classification and malware classification, using gradient boosting and neural network models.

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

TL;DR

This paper tackles clean-label backdoor attacks in cybersecurity contexts by introducing a model-agnostic defense that operates without clean data or knowledge of the victim model. The method combines dimensionality reduction, density-based clustering (OPTICS), and an iterative cluster-scoring procedure to identify and sanitize poisoned data while maintaining high model utility. It demonstrates substantial reductions in attack success (up to 90%) across two data modalities—network traffic and malware—using both gradient-boosted trees and neural networks, with options to filter or patch suspicious clusters. The approach is practical for security deployments and generalizes across model types and data modalities, making it a versatile defense against stealthy training-time poisoning.

Abstract

The training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature. To show the generality of our proposed mitigation, we evaluate it on two clean-label model-agnostic attacks on two different classic cybersecurity data modalities: network flows classification and malware classification, using gradient boosting and neural network models.
Paper Structure (40 sections, 8 figures, 7 tables, 1 algorithm)

This paper contains 40 sections, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Selective Amnesia defense applied to the attack against the CTU-13 Neris botnet classifier. The plots compare in attack success rates before and after recovery, and the F1 score on test data, for different sizes of the clean dataset. Attack run with Entropy feature selection.
  • Figure 2: Pipeline of our defense strategy.
  • Figure 3: Row 0: Log-loss of model trained on $C_0 \bigcup D_{y=1}$ and evaluated on clusters $C_j$. Rows 1-20: Log-loss of model trained on $C_0 \bigcup C_i \bigcup D_{y=1}$ and evaluated on clusters $C_j$. Note that cluster 11 consists of poisoned data and the remainder contain only clean data. Experiment on CTU-13, gradient boosting classifier, attack run with entropy feature selection.
  • Figure 4: Iterative scoring on the CTU-13 botnet classification task for the gradient boosting model. The plot shows average metrics for a set of experiments: SHAP and Entropy attacker feature selection, for the Full trigger attack, at 5 different poisoning rates.
  • Figure 5: Iterative scoring on the CTU-13 botnet classification task for the neural network model. The plot shows average metrics for a set of experiments: SHAP and Entropy attacker feature selection, for the Full trigger attack, at 5 different poisoning rates.
  • ...and 3 more figures