Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

Xin Zhou; Kisub Kim; Bowen Xu; DongGyun Han; David Lo

Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

Xin Zhou, Kisub Kim, Bowen Xu, DongGyun Han, David Lo

TL;DR

VulMaster is proposed, a Transformer-based neural network model that excels at generating vulnerability repairs through data-centric innovation and exhibits substantial improvements compared to the learning-based state-of-the-art vulnerability repair approach.

Abstract

The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent structure, and 3) they do not tap into the valuable expert knowledge present in the expert system. To address this, we propose VulMaster, a Transformer-based neural network model that excels at generating vulnerability repairs through data-centric innovation. Specifically, VulMaster introduces the utilization and combination of various types of input data, including complete vulnerable code of any size, vulnerable code structures, and expert knowledge from the CWE system. Additionally, VulMaster leverages the collaboration between two Large Language Models (LLMs), CodeT5 and ChatGPT: CodeT5 acts as the customizable backbone LLM, fine-tuned with the training data, while ChatGPT supplements by providing missing relevant inputs to CodeT5. We evaluated VulMaster on a real-world C/C++ vulnerability repair dataset comprising 1,754 projects with 5,800 vulnerable functions. The experimental results demonstrated that VulMaster exhibits substantial improvements compared to the learning-based state-of-the-art vulnerability repair approach. Specifically, VulMaster improves the EM, BLEU, and CodeBLEU scores from 10.2\% to 20.0\%, 21.3\% to 29.3\%, and 32.5\% to 40.9\%, respectively.

Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

TL;DR

Abstract

Paper Structure (24 sections, 5 figures, 7 tables)

This paper contains 24 sections, 5 figures, 7 tables.

Introduction
Preliminaries and Motivation
Motivating Example
Background
Approach
Vulnerable Code Preprocessing
CWE knowledge Extraction
Fusion-in-Decoder and Relevance Prediction
Experimental Design
Dataset for Evaluation
Baselines
Experimental Setting
Experimental Results
RQ1. The Effectiveness of VulMaster
RQ2. The Impact of the Core of VulMaster
...and 9 more sections

Figures (5)

Figure 1: A motivating example. ❶ the process of how junior developers repair vulnerability; ❷ a vulnerable function and its fixes from the ImageMagick project; ❸ a vulnerable example and its fixes from the CWE website.
Figure 2: Overall Framework of VulMaster.
Figure 3: Details of CWE knowledge extraction. ❶ the CWE name and one example from the CWE website; ❷ the process of generating fixes of typical vulnerable examples from CWE; ❷ the process to obtain vulnerable-fix code pairs given the target CWE.
Figure 4: Model performance in EM on different groups.
Figure 5: Example of repairs by VulMaster and VulRepair.

Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

TL;DR

Abstract

Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

Authors

TL;DR

Abstract

Table of Contents

Figures (5)