Table of Contents
Fetching ...

A User-centered Security Evaluation of Copilot

Owura Asare, Meiyappan Nagappan, N. Asokan

TL;DR

This study examines how GitHub Copilot affects code security by conducting a within-subject user study where participants solve two programming problems with and without Copilot assistance. Using manual vulnerability analysis and nonparametric statistics, the authors find that Copilot can improve security on harder problems but shows no consistent impact on easier tasks or on the frequency of specific vulnerability types. The results suggest potential security benefits from Copilot in complex scenarios, while also highlighting the need for broader, more varied evaluations to generalize findings across languages, problem domains, and developer populations.

Abstract

Code generation tools driven by artificial intelligence have recently become more popular due to advancements in deep learning and natural language processing that have increased their capabilities. The proliferation of these tools may be a double-edged sword because while they can increase developer productivity by making it easier to write code, research has shown that they can also generate insecure code. In this paper, we perform a user-centered evaluation GitHub's Copilot to better understand its strengths and weaknesses with respect to code security. We conduct a user study where participants solve programming problems (with and without Copilot assistance) that have potentially vulnerable solutions. The main goal of the user study is to determine how the use of Copilot affects participants' security performance. In our set of participants (n=25), we find that access to Copilot accompanies a more secure solution when tackling harder problems. For the easier problem, we observe no effect of Copilot access on the security of solutions. We also observe no disproportionate impact of Copilot use on particular kinds of vulnerabilities. Our results indicate that there are potential security benefits to using Copilot, but more research is warranted on the effects of the use of code generation tools on technically complex problems with security requirements.

A User-centered Security Evaluation of Copilot

TL;DR

This study examines how GitHub Copilot affects code security by conducting a within-subject user study where participants solve two programming problems with and without Copilot assistance. Using manual vulnerability analysis and nonparametric statistics, the authors find that Copilot can improve security on harder problems but shows no consistent impact on easier tasks or on the frequency of specific vulnerability types. The results suggest potential security benefits from Copilot in complex scenarios, while also highlighting the need for broader, more varied evaluations to generalize findings across languages, problem domains, and developer populations.

Abstract

Code generation tools driven by artificial intelligence have recently become more popular due to advancements in deep learning and natural language processing that have increased their capabilities. The proliferation of these tools may be a double-edged sword because while they can increase developer productivity by making it easier to write code, research has shown that they can also generate insecure code. In this paper, we perform a user-centered evaluation GitHub's Copilot to better understand its strengths and weaknesses with respect to code security. We conduct a user study where participants solve programming problems (with and without Copilot assistance) that have potentially vulnerable solutions. The main goal of the user study is to determine how the use of Copilot affects participants' security performance. In our set of participants (n=25), we find that access to Copilot accompanies a more secure solution when tackling harder problems. For the easier problem, we observe no effect of Copilot access on the security of solutions. We also observe no disproportionate impact of Copilot use on particular kinds of vulnerabilities. Our results indicate that there are potential security benefits to using Copilot, but more research is warranted on the effects of the use of code generation tools on technically complex problems with security requirements.
Paper Structure (32 sections, 2 equations, 5 figures, 6 tables)

This paper contains 32 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: An overview of the user study, highlighting the key steps from recruiting participants to analyzing results.
  • Figure 2: Box plot and table summarizing participant's performance for Problem S with and without the use of Copilot.
  • Figure 3: Box plot and table summarizing participant's performance for Problem T with and without the use of Copilot.
  • Figure 4: Plot showing how participants' opinions compared to their security scores with and without Copilot for problem S.
  • Figure 5: Plot showing how participants' opinions compared to their security scores with and without Copilot for problem T.