Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4
Angus Yang, Zehan Li, Jie Li
TL;DR
The paper addresses how to maximize GenAI-assisted programming by comparing GPT-4 and GLM-4 across prompting strategies, using a Snake game module as the testbed. It demonstrates that simple, direct prompts yield the highest one-shot success, and a brief preliminary confirmation step further improves results, with a Chain-of-Thought rationale likely driving this gain. The study reports a 30 to 100-fold increase in code-generation efficiency and documents a paradigm shift toward AI-supervised development, supported by a citizen-science GenAI Coding Workshop. These findings offer operational norms for prompt design and model selection, highlighting practical implications for accessibility, education, and the software-development landscape.
Abstract
This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Additionally, adding a CoT-like preliminary confirmation step would further increase the success rate. Our results reveal that while GPT-4 marginally outperforms GLM-4, the difference is minimal for average users. In our simplified evaluation model, we see a remarkable 30 to 100-fold increase in code generation efficiency over traditional coding norms. Our GenAI Coding Workshop highlights the effectiveness and accessibility of the prompting methodology developed in this study. We observe that GenAI-assisted coding would trigger a paradigm shift in programming landscape, which necessitates developers to take on new roles revolving around supervising and guiding GenAI, and to focus more on setting high-level objectives and engaging more towards innovation.
