Table of Contents
Fetching ...

From Copilot to Pilot: Towards AI Supported Software Development

Rohith Pudari, Neil A. Ernst

TL;DR

The paper assesses the current capabilities and limits of AI-supported code completion, using Copilot as a representative tool, to understand how far such systems can support software engineering beyond basic code generation. It introduces a six-level software abstraction taxonomy and empirically evaluates Copilot on Pythonic idioms and JavaScript best practices, revealing strong syntax success but limited idiom alignment and insufficient handling of code smells and design-level tasks. The findings highlight that while AI-assisted tools can accelerate writing correct code, they struggle with idiomatic usage, architectural decisions, and multi-file design, underscoring the need for higher-level reasoning and curated training data. The study outlines implications for practitioners and researchers, emphasizing data quality, multi-file context, and design-aware generation as essential directions for advancing AI-supported software development tools.

Abstract

AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot's code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where 'basic programming functionality' such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.

From Copilot to Pilot: Towards AI Supported Software Development

TL;DR

The paper assesses the current capabilities and limits of AI-supported code completion, using Copilot as a representative tool, to understand how far such systems can support software engineering beyond basic code generation. It introduces a six-level software abstraction taxonomy and empirically evaluates Copilot on Pythonic idioms and JavaScript best practices, revealing strong syntax success but limited idiom alignment and insufficient handling of code smells and design-level tasks. The findings highlight that while AI-assisted tools can accelerate writing correct code, they struggle with idiomatic usage, architectural decisions, and multi-file design, underscoring the need for higher-level reasoning and curated training data. The study outlines implications for practitioners and researchers, emphasizing data quality, multi-file context, and design-aware generation as essential directions for advancing AI-supported software development tools.

Abstract

AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot's code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where 'basic programming functionality' such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.
Paper Structure (28 sections, 8 figures, 2 tables)

This paper contains 28 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: List comprehension Pythonic idiom and Copilot top suggestion.
  • Figure 2: Best practice for copying array contents and Copilot top suggestion.
  • Figure 3: Koopman's Autonomous Vehicle Safety Hierarchy of Needs koopman. SOTIF = safety of the intended function.
  • Figure 4: Hierarchy of software abstractions. Copilot cleared all green levels and struggled in red levels.
  • Figure 5: Code suggestion of AI-supported code completion tools at syntax level.
  • ...and 3 more figures