Table of Contents
Fetching ...

How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability Patching

Antonio Mastropaolo, Vittoria Nardone, Gabriele Bavota, Massimiliano Di Penta

TL;DR

This work tackles the problem of how training procedures influence deep learning-based vulnerability patching. It systematically compares self-supervised pre-training, supervised bug-fixing pre-training, and prompt-tuning across 23 model configurations using CodeT5/VulRepair data, with careful data cleaning to avoid leakage. The results show that bug-fix pre-training substantially improves patch generation, while prompt-tuning provides substantial gains mainly for self-supervised models and limited benefit for bug-fix pre-trained models. These findings inform data collection strategies and prompting approaches for practical vulnerability patching and point to future work with larger language models and more advanced prompts. Overall, the study highlights the continuing value of task-relevant pre-training and the cost-effective promise of prompt-tuning in data-scarce settings.

Abstract

Generative deep learning (DL) models have been successfully adopted for vulnerability patching. However, such models require the availability of a large dataset of patches to learn from. To overcome this issue, researchers have proposed to start from models pre-trained with general knowledge, either on the programming language or on similar tasks such as bug fixing. Despite the efforts in the area of automated vulnerability patching, there is a lack of systematic studies on how these different training procedures impact the performance of DL models for such a task. This paper provides a manyfold contribution to bridge this gap, by (i) comparing existing solutions of self-supervised and supervised pre-training for vulnerability patching; and (ii) for the first time, experimenting with different kinds of prompt-tuning for this task. The study required to train/test 23 DL models. We found that a supervised pre-training focused on bug-fixing, while expensive in terms of data collection, substantially improves DL-based vulnerability patching. When applying prompt-tuning on top of this supervised pre-trained model, there is no significant gain in performance. Instead, prompt-tuning is an effective and cheap solution to substantially boost the performance of self-supervised pre-trained models, i.e., those not relying on the bug-fixing pre-training.

How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability Patching

TL;DR

This work tackles the problem of how training procedures influence deep learning-based vulnerability patching. It systematically compares self-supervised pre-training, supervised bug-fixing pre-training, and prompt-tuning across 23 model configurations using CodeT5/VulRepair data, with careful data cleaning to avoid leakage. The results show that bug-fix pre-training substantially improves patch generation, while prompt-tuning provides substantial gains mainly for self-supervised models and limited benefit for bug-fix pre-trained models. These findings inform data collection strategies and prompting approaches for practical vulnerability patching and point to future work with larger language models and more advanced prompts. Overall, the study highlights the continuing value of task-relevant pre-training and the cost-effective promise of prompt-tuning in data-scarce settings.

Abstract

Generative deep learning (DL) models have been successfully adopted for vulnerability patching. However, such models require the availability of a large dataset of patches to learn from. To overcome this issue, researchers have proposed to start from models pre-trained with general knowledge, either on the programming language or on similar tasks such as bug fixing. Despite the efforts in the area of automated vulnerability patching, there is a lack of systematic studies on how these different training procedures impact the performance of DL models for such a task. This paper provides a manyfold contribution to bridge this gap, by (i) comparing existing solutions of self-supervised and supervised pre-training for vulnerability patching; and (ii) for the first time, experimenting with different kinds of prompt-tuning for this task. The study required to train/test 23 DL models. We found that a supervised pre-training focused on bug-fixing, while expensive in terms of data collection, substantially improves DL-based vulnerability patching. When applying prompt-tuning on top of this supervised pre-trained model, there is no significant gain in performance. Instead, prompt-tuning is an effective and cheap solution to substantially boost the performance of self-supervised pre-trained models, i.e., those not relying on the bug-fixing pre-training.
Paper Structure (26 sections, 1 figure, 4 tables)