Table of Contents
Fetching ...

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Ionut Daniel Fagadau, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

TL;DR

This study investigates how eight prompt features shape Copilot-generated Java method bodies by conducting a large-scale controlled experiment with 124,800 prompts for 200 methods drawn from GitHub and LeetCode. It evaluates correctness, code size, complexity, and similarity to the developers' code using compilation tests, test execution, cyclomatic complexity, LOC, Levenshtein distance, and CodeBLEU, applying rigorous statistical tests. Key findings show that including a method summary and input–output examples substantially improves the likelihood of producing code that compiles and passes tests, while boundary information and explicit parameter references provide little or mixed benefit. The results yield practical prompt-design guidance and highlight dataset-dependent effects, with data and materials publicly available for replication and further research.

Abstract

Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

TL;DR

This study investigates how eight prompt features shape Copilot-generated Java method bodies by conducting a large-scale controlled experiment with 124,800 prompts for 200 methods drawn from GitHub and LeetCode. It evaluates correctness, code size, complexity, and similarity to the developers' code using compilation tests, test execution, cyclomatic complexity, LOC, Levenshtein distance, and CodeBLEU, applying rigorous statistical tests. Key findings show that including a method summary and input–output examples substantially improves the likelihood of producing code that compiles and passes tests, while boundary information and explicit parameter references provide little or mixed benefit. The results yield practical prompt-design guidance and highlight dataset-dependent effects, with data and materials publicly available for replication and further research.

Abstract

Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.
Paper Structure (15 sections, 10 figures, 6 tables)

This paper contains 15 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Complexity and size of methods in GitHub and LeetCode.
  • Figure 2: Influence of Tense on GitHub prompts.
  • Figure 3: Influence of Summary on testing.
  • Figure 4: Influence of Examples on compilation.
  • Figure 5: Influence of Examples on testing.
  • ...and 5 more figures