Optimizing AI-Assisted Code Generation
Simon Torka, Sahin Albayrak
TL;DR
The paper addresses the security, reliability, functionality, and accessibility challenges of AI-assisted code generation based on large language models. It surveys the current state, including encoder–decoder architectures and platforms, and analyzes risks such as data poisoning, prompt injection, and vulnerabilities in generated code. A conceptual framework is proposed, advocating an encoder–decoder setup (e.g., CodeT5, CodeBERT) with zero-trust data practices, iterative runtime learning, prompt-support tools, and defense-in-depth to produce secure conventional code and AI models accessible to non-experts. The work emphasizes human-centered design, governance, and evaluation, aiming to realize AI4G while ensuring safe, high-quality software development.
Abstract
In recent years, the rise of AI-assisted code-generation tools has significantly transformed software development. While code generators have mainly been used to support conventional software development, their use will be extended to powerful and secure AI systems. Systems capable of generating code, such as ChatGPT, OpenAI Codex, GitHub Copilot, and AlphaCode, take advantage of advances in machine learning (ML) and natural language processing (NLP) enabled by large language models (LLMs). However, it must be borne in mind that these models work probabilistically, which means that although they can generate complex code from natural language input, there is no guarantee for the functionality and security of the generated code. However, to fully exploit the considerable potential of this technology, the security, reliability, functionality, and quality of the generated code must be guaranteed. This paper examines the implementation of these goals to date and explores strategies to optimize them. In addition, we explore how these systems can be optimized to create safe, high-performance, and executable artificial intelligence (AI) models, and consider how to improve their accessibility to make AI development more inclusive and equitable.
