How to Train your Antivirus: RL-based Hardening through the Problem-Space

Ilias Tsingenopoulos; Jacopo Cortellazzi; Branislav Bošanský; Simone Aonzo; Davy Preuveneers; Wouter Joosen; Fabio Pierazzi; Lorenzo Cavallaro

How to Train your Antivirus: RL-based Hardening through the Problem-Space

Ilias Tsingenopoulos, Jacopo Cortellazzi, Branislav Bošanský, Simone Aonzo, Davy Preuveneers, Wouter Joosen, Fabio Pierazzi, Lorenzo Cavallaro

TL;DR

This work investigates a specific ML architecture employed in the pipeline of a widely-known commercial antivirus with the goal to harden it against adversarial malware, and introduces a novel Reinforcement Learning approach for constructing adversarial examples, a constituent part of adversarially training a model against evasion.

Abstract

ML-based malware detection on dynamic analysis reports is vulnerable to both evasion and spurious correlations. In this work, we investigate a specific ML architecture employed in the pipeline of a widely-known commercial antivirus company, with the goal to harden it against adversarial malware. Adversarial training, the sole defensive technique that can confer empirical robustness, is not applicable out of the box in this domain, for the principal reason that gradient-based perturbations rarely map back to feasible problem-space programs. We introduce a novel Reinforcement Learning approach for constructing adversarial examples, a constituent part of adversarially training a model against evasion. Our approach comes with multiple advantages. It performs modifications that are feasible in the problem-space, and only those; thus it circumvents the inverse mapping problem. It also makes possible to provide theoretical guarantees on the robustness of the model against a particular set of adversarial capabilities. Our empirical exploration validates our theoretical insights, where we can consistently reach 0% Attack Success Rate after a few adversarial retraining iterations.

How to Train your Antivirus: RL-based Hardening through the Problem-Space

TL;DR

Abstract

Paper Structure (26 sections, 1 theorem, 1 equation, 4 figures, 3 tables)

This paper contains 26 sections, 1 theorem, 1 equation, 4 figures, 3 tables.

Introduction
Background & Related Work
Malware detection
Problem-Space Attacks
Defenses & Mitigations
Research Gap
Threat Model
AutoRobust
Approach
Explanation-guided Hardening
Methodology
Dataset
Representation
Transformations
Adversarial Binaries
...and 11 more sections

Key Result

theorem 1

Given model $\mathcal{M}$ and adversary $\mathcal{V}$ with problem-space capabilities $\mathcal{C}$, $\mathcal{M}$ is robust to problem-space evasion with probability $p$ if and only if the expected reward of the optimal policy $\pi^*(a|s)$ in the corresponding MDP is $1-p$.

Figures (4)

Figure 1: Comparison between traditional gradient-based attacks and AutoRobust. The dotted path shows a typical gradient-based attack: first perturbing to $\mathbf{x} + \boldsymbol{\delta}$ then projecting to $\mathbf{x} + \boldsymbol{\delta}^\prime$ in the feasible problem-space $\Lambda$. Our approach (dense path) that employs transformations $\alpha_t$ in succession, moves by definition only within the feasible problem space $\Lambda$. The background displays a gradient field over the value of the discriminant function $h(\mathbf{x})$, with negative values (green) for the target class. The thick solid area $\Phi$ represents the feasible feature-space, while the areas denoted by $\Lambda$ represent the feasible problem-space mapped to $\Phi$.
Figure 2: Schematic depiction of the AutoRobust pipeline. In the inner loop, the RL agent attacks the model and generates adversarial reports. In the outer loop, the model is retrained with standard minibatch gradient descent on clean and adversarial reports.
Figure 3: Progression of Attack Success Rate (blue), Clean Accuracy (yellow), and Robust Accuracy (green) over the 15 iterations of adversarial training. For each metric, the mean with one standard deviation are displayed, as we do multiple runs where hyperparameters and random seeds for selecting samples vary.
Figure 4: The most frequent explanations as returned by the HMIL model explainer, over the 15 iterations. As for each iteration the number of episodes varies, we normalize explanations to a percentage over the whole iteration; mean values with one standard deviation are plotted.

Theorems & Definitions (1)

theorem 1: Problem-Space p-Robustness

How to Train your Antivirus: RL-based Hardening through the Problem-Space

TL;DR

Abstract

How to Train your Antivirus: RL-based Hardening through the Problem-Space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (1)