Measuring the Runtime Performance of C++ Code Written by Humans using GitHub Copilot
Daniel Erhabor, Sreeharsha Udayashankar, Meiyappan Nagappan, Samer Al-Kiswany
TL;DR
The paper investigates whether GitHub Copilot, when used by developers in real coding tasks, influences the runtime performance of C++ programs. Using a within-subjects study with 32 participants solving two problems (I/O-intensive and multi-threaded), the authors compare Copilot-assisted and non-assisted solutions on a Linux testbed with 32 repeated runs per submission. They report statistically significant slower runtimes for Copilot-assisted code, with Copilot-unaided solutions averaging about $26\%$ faster for the I/O task and $15\%$ faster for the concurrent task, and note that higher expertise correlates with faster runtimes. The study provides nuanced insights into how Copilot shapes correctness and performance, showing that Copilot's suggestions often skew toward slower optimizations, and highlights the need for performance-aware benchmarks and tooling improvements in NL2Code systems. The results have practical implications for developers and maintainers of Copilot, underscoring the importance of reviewing non-functional aspects of Copilot-generated code and establishing non-functional benchmarks for future work.
Abstract
GitHub Copilot is an artificially intelligent programming assistant used by many developers. While a few studies have evaluated the security risks of using Copilot, there has not been any study to show if it aids developers in producing code with better runtime performance. We evaluate the runtime performance of C++ code produced when developers use GitHub Copilot versus when they do not. To this end, we conducted a user study with 32 participants where each participant solved two C++ programming problems, one with Copilot and the other without it and measured the runtime performance of the participants' solutions on our test data. Our results suggest that using Copilot may produce C++ code with (statistically significant) slower runtime performance.
