Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

Angus Yang; Zehan Li; Jie Li

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

Angus Yang, Zehan Li, Jie Li

TL;DR

The paper addresses how to maximize GenAI-assisted programming by comparing GPT-4 and GLM-4 across prompting strategies, using a Snake game module as the testbed. It demonstrates that simple, direct prompts yield the highest one-shot success, and a brief preliminary confirmation step further improves results, with a Chain-of-Thought rationale likely driving this gain. The study reports a 30 to 100-fold increase in code-generation efficiency and documents a paradigm shift toward AI-supervised development, supported by a citizen-science GenAI Coding Workshop. These findings offer operational norms for prompt design and model selection, highlighting practical implications for accessibility, education, and the software-development landscape.

Abstract

This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Additionally, adding a CoT-like preliminary confirmation step would further increase the success rate. Our results reveal that while GPT-4 marginally outperforms GLM-4, the difference is minimal for average users. In our simplified evaluation model, we see a remarkable 30 to 100-fold increase in code generation efficiency over traditional coding norms. Our GenAI Coding Workshop highlights the effectiveness and accessibility of the prompting methodology developed in this study. We observe that GenAI-assisted coding would trigger a paradigm shift in programming landscape, which necessitates developers to take on new roles revolving around supervising and guiding GenAI, and to focus more on setting high-level objectives and engaging more towards innovation.

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

TL;DR

Abstract

Paper Structure (25 sections, 5 figures, 3 tables)

This paper contains 25 sections, 5 figures, 3 tables.

Introduction
Literature Review
Overview of Existing Studies on GenAI Applications in Programming
Analysis of the Impact of Prompt Scheme, Complexity, and Clarity on GenAI Performance
Comparative Analysis of Different LLMs in Programming Contexts
Gap in Literature Regarding Operational Norms for GenAI-Assisted Programming
Concluding Remarks
Methods
Research Design
Evaluation Criteria
Prompt Levels Design
One-shot prompt
Follow-up prompt setup
Data Collection
Experiment Procedure
...and 10 more sections

Figures (5)

Figure 1: Representative screenshots of Snake games programmed in Python v3.11, assisted by GenAI: GPT-4 (left) and GLM-4 (right)
Figure 2: One-shot prompt success rate for GPT-4
Figure 3: One-shot prompt success rate for GLM-4
Figure 4: GenAI Coding Workshop at DeepBlue Technology (February 2024) - participant groups present their projects: (a) Breakout game, (b) Snake game with difficulty level selection, (c) Tetris game, and (d) Five-in-a-row game.
Figure :

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

TL;DR

Abstract

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

Authors

TL;DR

Abstract

Table of Contents

Figures (5)