Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't

Maria Camporese; Fabio Massacci

Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't

Maria Camporese, Fabio Massacci

TL;DR

The paper investigates whether a machine-learning pre-screening filter can accelerate automated vulnerability repair (AVR) by screening candidate patches before a traditional testing-based validator. It derives formal time-and-performance bounds that relate patch prevalence, classifier precision/recall, and per-patch processing times, e.g., $\Delta n/n \ge 1/R_M - 1$ and $\tau_M \le \tau_V \cdot (n/(n+\Delta n) - (R_M/P_M)\cdot \pi) \le \tau_V \cdot (R_M/P_M) \cdot (P_M - \pi)$, and emphasizes that a gain requires a faster, sufficiently accurate ML filter. The authors map vulnerability detectors to an effective positive ML model via relations like $R_M = 1 - \text{FPR}_{MVD}$ and $P_M$ as a function of detector metrics, while noting that many works do not report critical values such as $Far_{MVD}$. Preliminary experiments with models such as VulDeePecker and LineVul indicate that, given current preprocessing times and patch-generation rates, ML pre-screening is not yet reliably beneficial before testing in realistic AVR pipelines. The work highlights significant data-reporting gaps and outlines future plans to broaden evaluation and ultimately determine conditions under which ML pre-screening can meaningfully speed up AVR without sacrificing patch quality.

Abstract

[Context:] The acceptance of candidate patches in automated program repair has been typically based on testing oracles. Testing requires typically a costly process of building the application while ML models can be used to quickly classify patches, thus allowing more candidate patches to be generated in a positive feedback loop. [Problem:] If the model predictions are unreliable (as in vulnerability detection) they can hardly replace the more reliable oracles based on testing. [New Idea:] We propose to use an ML model as a preliminary filter of candidate patches which is put in front of a traditional filter based on testing. [Preliminary Results:] We identify some theoretical bounds on the precision and recall of the ML algorithm that makes such operation meaningful in practice. With these bounds and the results published in the literature, we calculate how fast some of state-of-the art vulnerability detectors must be to be more effective over a traditional AVR pipeline such as APR4Vuln based just on testing.

Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't

TL;DR

Abstract

Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)