Table of Contents
Fetching ...

Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

TL;DR

This study tackles software vulnerability prediction in low-resource programming languages (Kotlin, Swift, Rust) where SV data is scarce. It first evaluates CodeBERT, the current SOTA model, and finds substantial performance gaps compared to C/C++, with data-sampling techniques providing little relief. It then investigates ChatGPT, using few-shot prompts and fine-tuning, and observes meaningful improvements—up to 34.4% in function-level F1 and up to 53.5% in line-level localization—though the gap with abundant-resource languages remains large. The work demonstrates the potential of LLM-based approaches for SV prediction in low-resource settings and provides an empirical benchmark, dataset, and code to foster further research.

Abstract

Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction.

Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

TL;DR

This study tackles software vulnerability prediction in low-resource programming languages (Kotlin, Swift, Rust) where SV data is scarce. It first evaluates CodeBERT, the current SOTA model, and finds substantial performance gaps compared to C/C++, with data-sampling techniques providing little relief. It then investigates ChatGPT, using few-shot prompts and fine-tuning, and observes meaningful improvements—up to 34.4% in function-level F1 and up to 53.5% in line-level localization—though the gap with abundant-resource languages remains large. The work demonstrates the potential of LLM-based approaches for SV prediction in low-resource settings and provides an empirical benchmark, dataset, and code to foster further research.

Abstract

Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction.
Paper Structure (18 sections, 4 figures, 5 tables)

This paper contains 18 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Exemplary vulnerable function and lines corresponding to CVE-2020-15230 extracted from the respective vulnerability-fixing commit in the vapor project in Swift.
  • Figure 2: Prompt to use ChatGPT with few-shot learning for function-level SV prediction.
  • Figure 3: Prompt to use ChatGPT with fine-tuning for function-level SV prediction.
  • Figure 4: Prompt to use the ChatGPT model trained at the function level for line-level SV prediction.