Prompting in the Wild: An Empirical Study of Prompt Evolution in Software Repositories
Mahan Tafreshipour, Aaron Imani, Eric Huang, Eduardo Almeida, Thomas Zimmermann, Iftekhar Ahmed
TL;DR
This study provides the first empirical examination of how prompts evolve in open-source, LLM-integrated software by tracing 1,262 prompt changes across 243 GitHub repositories. It combines qualitative coding of change types with quantitative mapping to software maintenance activities, documentation practices, and LLM impact, uncovering that prompts are mainly expanded and refined during feature development, while documentation is typically sparse and inconsistencies can arise. The findings highlight significant gaps in testing and documentation, and show that prompt changes do not always translate into the intended model behavior, underscoring the need for automated validation tools and robust prompt-management practices. The work offers actionable guidance for researchers and practitioners to improve the reliability and maintainability of LLM-driven software through better tooling, documentation, and testing frameworks.
Abstract
The adoption of Large Language Models (LLMs) is reshaping software development as developers integrate these LLMs into their applications. In such applications, prompts serve as the primary means of interacting with LLMs. Despite the widespread use of LLM-integrated applications, there is limited understanding of how developers manage and evolve prompts. This study presents the first empirical analysis of prompt evolution in LLM-integrated software development. We analyzed 1,262 prompt changes across 243 GitHub repositories to investigate the patterns and frequencies of prompt changes, their relationship with code changes, documentation practices, and their impact on system behavior. Our findings show that developers primarily evolve prompts through additions and modifications, with most changes occurring during feature development. We identified key challenges in prompt engineering: only 21.9% of prompt changes are documented in commit messages, changes can introduce logical inconsistencies, and misalignment often occurs between prompt changes and LLM responses. These insights emphasize the need for specialized testing frameworks, automated validation tools, and improved documentation practices to enhance the reliability of LLM-integrated applications.
