Table of Contents
Fetching ...

Automatic Programming: Large Language Models and Beyond

Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

TL;DR

Automatic programming addresses the challenge of turning natural-language intents into correct, trustworthy code while balancing quality, security, and responsibility. The paper surveys foundational and modern approaches to code generation and program repair, highlighting how LLMs augment (and complicate) automatic coding, and discusses trust, safety, and maintenance in practical settings. It proposes a multi-faceted toolbox including constraint-based repair, lightweight analysis, and evidence-based repair to raise confidence in LLM-generated code, and outlines a near- to mid-term programming environment where developers act as designers and QA specialists aided by LLM agents. The findings underscore the need for systematic evaluation, robust tooling, and evolving programming workflows to safely harness autonomous programming at scale.

Abstract

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance

Automatic Programming: Large Language Models and Beyond

TL;DR

Automatic programming addresses the challenge of turning natural-language intents into correct, trustworthy code while balancing quality, security, and responsibility. The paper surveys foundational and modern approaches to code generation and program repair, highlighting how LLMs augment (and complicate) automatic coding, and discusses trust, safety, and maintenance in practical settings. It proposes a multi-faceted toolbox including constraint-based repair, lightweight analysis, and evidence-based repair to raise confidence in LLM-generated code, and outlines a near- to mid-term programming environment where developers act as designers and QA specialists aided by LLM agents. The findings underscore the need for systematic evaluation, robust tooling, and evolving programming workflows to safely harness autonomous programming at scale.

Abstract

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance
Paper Structure (26 sections, 8 equations, 6 figures, 2 tables)

This paper contains 26 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Triangle program from cacm19 and the test data accompanying the program
  • Figure 2: Inferring Specifications for Program Repair (ack. unpublished article repair-arxiv)
  • Figure 3: The program for a LeetCode programming task generated by Codex.
  • Figure 4: Example repair generated by GPT-4 (abbreviated)
  • Figure 5: Evolution of programmer roles captured by DALL-E; programmer role (a) as code composer and designer instead of code writer, (b) as quality assurance specialist instead of code writer.
  • ...and 1 more figures