RefleXGen:The unexamined code is not worth using

Bin Wang; Hui Li; AoFan Liu; BoTao Yang; Ao Yang; YiLu Zhong; Weixiang Huang; Yanping Zhang; Runhuai Huang; Weimin Zeng

RefleXGen:The unexamined code is not worth using

Bin Wang, Hui Li, AoFan Liu, BoTao Yang, Ao Yang, YiLu Zhong, Weixiang Huang, Yanping Zhang, Runhuai Huang, Weimin Zeng

TL;DR

This work tackles the security of AI-generated code by introducing RefleXGen, a self-reflective framework that leverages Retrieval-Augmented Generation to iteratively improve code safety without fine-tuning or dataset creation. By maintaining a dynamic security knowledge base built from the model's reflections and secure snippets, RefleXGen guides subsequent code generation cycles toward safer outputs. Experimental results across GPT-3.5 Turbo, GPT-4o, CodeQwen, and Gemini show meaningful improvements in code security, demonstrating the practicality of self-reflection as a resource-efficient strategy for secure code generation. The approach highlights the potential of reflective mechanisms to autonomously enhance code safety in real-world deployments.

Abstract

Security in code generation remains a pivotal challenge when applying large language models (LLMs). This paper introduces RefleXGen, an innovative method that significantly enhances code security by integrating Retrieval-Augmented Generation (RAG) techniques with guided self-reflection mechanisms inherent in LLMs. Unlike traditional approaches that rely on fine-tuning LLMs or developing specialized secure code datasets - processes that can be resource-intensive - RefleXGen iteratively optimizes the code generation process through self-assessment and reflection without the need for extensive resources. Within this framework, the model continuously accumulates and refines its knowledge base, thereby progressively improving the security of the generated code. Experimental results demonstrate that RefleXGen substantially enhances code security across multiple models, achieving a 13.6% improvement with GPT-3.5 Turbo, a 6.7% improvement with GPT-4o, a 4.5% improvement with CodeQwen, and a 5.8% improvement with Gemini. Our findings highlight that improving the quality of model self-reflection constitutes an effective and practical strategy for strengthening the security of AI-generated code.

RefleXGen:The unexamined code is not worth using

TL;DR

Abstract

RefleXGen:The unexamined code is not worth using

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)