On Developers' Self-Declaration of AI-Generated Code: An Analysis of Practices
Syed Mohammad Kashif, Peng Liang, Amjed Tahir
TL;DR
The paper empirically investigates how developers self-declare AI-generated code in real-world projects through a two-phase mixed-methods study (GitHub repository mining of 613 AI-generated code snippets from 586 repos and a practitioner survey with 111 respondents). It characterizes the content and scope of self-declaration comments, reveals a predominance of snippet-level declarations with simple comments, and shows that declarations are more common when AI-generated code is less extensively modified. Key motivations for declaring include tracking for future review and ethical transparency, while reasons against declaration center on extensive customization and perceived redundancy. The work yields practical guidelines for self-declaration, highlights implications for research and IDE tooling, and discusses threats to validity and avenues for future work in AI provenance, code quality, and automated declaration support.
Abstract
AI code generation tools have gained significant popularity among developers, who use them to assist in software development due to their capability to generate code. Existing studies mainly explored the quality, e.g., correctness and security, of AI-generated code, while in real-world software development, the prerequisite is to distinguish AI-generated code from human-written code, which emphasizes the need to explicitly declare AI-generated code by developers. To this end, this study intends to understand the ways developers use to self-declare AI-generated code and explore the reasons why developers choose to self-declare or not. We conducted a mixed-methods study consisting of two phases. In the first phase, we mined GitHub repositories and collected 613 instances of AI-generated code snippets. In the second phase, we conducted a follow-up practitioners' survey, which received 111 valid responses. Our research revealed the practices followed by developers to self-declare AI-generated code. Most practitioners (76.6%) always or sometimes self-declare AI-generated code. In contrast, other practitioners (23.4%) noted that they never self-declare AI-generated code. The reasons for self-declaring AI-generated code include the need to track and monitor the code for future review and debugging, and ethical considerations. The reasons for not self-declaring AI-generated code include extensive modifications to AI-generated code and the developers' perception that self-declaration is an unnecessary activity. We finally provided guidelines for practitioners to self-declare AI-generated code, addressing ethical and code quality concerns.
