From Copilot to Pilot: Towards AI Supported Software Development
Rohith Pudari, Neil A. Ernst
TL;DR
The paper assesses the current capabilities and limits of AI-supported code completion, using Copilot as a representative tool, to understand how far such systems can support software engineering beyond basic code generation. It introduces a six-level software abstraction taxonomy and empirically evaluates Copilot on Pythonic idioms and JavaScript best practices, revealing strong syntax success but limited idiom alignment and insufficient handling of code smells and design-level tasks. The findings highlight that while AI-assisted tools can accelerate writing correct code, they struggle with idiomatic usage, architectural decisions, and multi-file design, underscoring the need for higher-level reasoning and curated training data. The study outlines implications for practitioners and researchers, emphasizing data quality, multi-file context, and design-aware generation as essential directions for advancing AI-supported software development tools.
Abstract
AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot's code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where 'basic programming functionality' such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.
