Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Ionut Daniel Fagadau; Leonardo Mariani; Daniela Micucci; Oliviero Riganelli

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Ionut Daniel Fagadau, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

TL;DR

This study investigates how eight prompt features shape Copilot-generated Java method bodies by conducting a large-scale controlled experiment with 124,800 prompts for 200 methods drawn from GitHub and LeetCode. It evaluates correctness, code size, complexity, and similarity to the developers' code using compilation tests, test execution, cyclomatic complexity, LOC, Levenshtein distance, and CodeBLEU, applying rigorous statistical tests. Key findings show that including a method summary and input–output examples substantially improves the likelihood of producing code that compiles and passes tests, while boundary information and explicit parameter references provide little or mixed benefit. The results yield practical prompt-design guidance and highlight dataset-dependent effects, with data and materials publicly available for replication and further research.

Abstract

Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

TL;DR

Abstract

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Authors

TL;DR

Abstract

Table of Contents

Figures (10)