Table of Contents
Fetching ...

Assessing AI-Based Code Assistants in Method Generation Tasks

Vincenzo Corso, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

TL;DR

The paper addresses evaluating AI-based code assistants in method generation tasks. It builds a dataset of 100 real-world Java methods and prompts four assistants (GitHub Copilot, Tabnine, ChatGPT, and Google Bard) with method comments and signatures. It evaluates generated code across five criteria—functional correctness, McCabe complexity, execution efficiency, code size, and similarity to developer-written code—using static metrics, tests, and statistical comparisons. The findings show Copilot is the most effective for correctness but overall correctness remains limited, especially with inter-class dependencies; generated code tends to be good starting points but requires substantial revision, and efficiency and similarity vary across tools. The study suggests complementary tool use and a focus on dependency handling to improve practical viability of AI-assisted code generation.

Abstract

AI-based code assistants are increasingly popular as a means to enhance productivity and improve code quality. This study compares four AI-based code assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method generation tasks, assessing their ability to produce accurate, correct, and efficient code. Results show that code assistants are useful, with complementary capabilities, although they rarely generate ready-to-use correct code.

Assessing AI-Based Code Assistants in Method Generation Tasks

TL;DR

The paper addresses evaluating AI-based code assistants in method generation tasks. It builds a dataset of 100 real-world Java methods and prompts four assistants (GitHub Copilot, Tabnine, ChatGPT, and Google Bard) with method comments and signatures. It evaluates generated code across five criteria—functional correctness, McCabe complexity, execution efficiency, code size, and similarity to developer-written code—using static metrics, tests, and statistical comparisons. The findings show Copilot is the most effective for correctness but overall correctness remains limited, especially with inter-class dependencies; generated code tends to be good starting points but requires substantial revision, and efficiency and similarity vary across tools. The study suggests complementary tool use and a focus on dependency handling to improve practical viability of AI-assisted code generation.

Abstract

AI-based code assistants are increasingly popular as a means to enhance productivity and improve code quality. This study compares four AI-based code assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method generation tasks, assessing their ability to produce accurate, correct, and efficient code. Results show that code assistants are useful, with complementary capabilities, although they rarely generate ready-to-use correct code.
Paper Structure (4 sections)

This paper contains 4 sections.