Table of Contents
Fetching ...

How Secure is Code Generated by ChatGPT?

Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, Baba Mamadou Camara

TL;DR

The paper systematically evaluates the security of code generated by ChatGPT by eliciting 21 programs across five languages, probing vulnerabilities, and attempting to produce secure revisions. It finds that ChatGPT often generates insecure code, though it can be guided toward secure versions with explicit prompts and iterative testing. The study discusses ethical considerations, pedagogical value, and practical strategies such as automated testing to improve security, suggesting that human oversight remains essential. Overall, ChatGPT is a useful educational and development aid but not yet a reliable substitute for security-aware programming. The work highlights diverse vulnerability classes—from SQL injection and deserialization to memory safety and cryptographic misuses—demonstrating the need for structured evaluation when deploying AI-generated code.

Abstract

In recent years, large language models have been responsible for great advances in the field of artificial intelligence (AI). ChatGPT in particular, an AI chatbot developed and recently released by OpenAI, has taken the field to the next level. The conversational model is able not only to process human-like text, but also to translate natural language into code. However, the safety of programs generated by ChatGPT should not be overlooked. In this paper, we perform an experiment to address this issue. Specifically, we ask ChatGPT to generate a number of program and evaluate the security of the resulting source code. We further investigate whether ChatGPT can be prodded to improve the security by appropriate prompts, and discuss the ethical aspects of using AI to generate code. Results suggest that ChatGPT is aware of potential vulnerabilities, but nonetheless often generates source code that are not robust to certain attacks.

How Secure is Code Generated by ChatGPT?

TL;DR

The paper systematically evaluates the security of code generated by ChatGPT by eliciting 21 programs across five languages, probing vulnerabilities, and attempting to produce secure revisions. It finds that ChatGPT often generates insecure code, though it can be guided toward secure versions with explicit prompts and iterative testing. The study discusses ethical considerations, pedagogical value, and practical strategies such as automated testing to improve security, suggesting that human oversight remains essential. Overall, ChatGPT is a useful educational and development aid but not yet a reliable substitute for security-aware programming. The work highlights diverse vulnerability classes—from SQL injection and deserialization to memory safety and cryptographic misuses—demonstrating the need for structured evaluation when deploying AI-generated code.

Abstract

In recent years, large language models have been responsible for great advances in the field of artificial intelligence (AI). ChatGPT in particular, an AI chatbot developed and recently released by OpenAI, has taken the field to the next level. The conversational model is able not only to process human-like text, but also to translate natural language into code. However, the safety of programs generated by ChatGPT should not be overlooked. In this paper, we perform an experiment to address this issue. Specifically, we ask ChatGPT to generate a number of program and evaluate the security of the resulting source code. We further investigate whether ChatGPT can be prodded to improve the security by appropriate prompts, and discuss the ethical aspects of using AI to generate code. Results suggest that ChatGPT is aware of potential vulnerabilities, but nonetheless often generates source code that are not robust to certain attacks.
Paper Structure (22 sections, 1 figure, 1 table)

This paper contains 22 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Code generation by ChatGPT followed by vulnerability check.