A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages
Alessio Buscemi
TL;DR
This study evaluates ChatGPT 3.5's ability to generate runnable code across 10 languages and 4 domains using 40 tasks, highlighting substantial non-determinism and language-dependent performance. It employs a controlled API-based setup with a fixed prompting strategy, analyzing executability, time to generate, and code length across languages and task categories. Key findings show Julia achieving the highest executable rate while C++ performs poorly, with high-level languages generally more amenable to code generation than low-level ones; the study also notes ethical and operational limitations affecting outputs. The work discusses implications for language evolution, industry adoption, and the need for standardized, multi-language benchmarking to fairly assess LLM-assisted code generation and guide future research and policy. The results suggest LLMs could disrupt software development workflows, driving efficiency while necessitating reskilling and ethical governance.
Abstract
Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training using large datasets in order to understand and produce language that closely resembles that of humans. These models have reached a level of proficiency where they are capable of successfully completing university exams across several disciplines and generating functional code to handle novel problems. This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022, which has gained significant recognition for its impressive text generating and code creation capabilities. The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains. Based on the findings derived from this research, major unexpected behaviors and limitations of the model have been identified. This study aims to identify potential areas for development and examine the ramifications of automated code generation on the evolution of programming languages and on the tech industry.
