Secure-Instruct: An Automated Pipeline for Synthesizing Instruction-Tuning Datasets Using LLMs for Secure Code Generation
Junjie Li, Fazle Rabbi, Bo Yang, Song Wang, Jinqiu Yang
TL;DR
This paper tackles the problem of insecure code generation by Large Language Models (LLMs) and the limitations of prior data-driven approaches that rely on real vulnerability fixes. It introduces Secure-Instruct, an automated pipeline that synthesizes high-quality vulnerable and secure code examples from CWE documentation, using two data-generation schemes and two instruction-tuning strategies, with verification by static analyzers CodeQL and SonarQube. The framework expands evaluation benchmarks (CWEBench and CWEval) and shows consistent improvements in secure code generation and functional correctness across four representative LLMs, significantly outperforming the state-of-the-art SafeCoder. The work demonstrates scalable, cost-effective generation of training data without manual curation and provides replication materials and datasets to facilitate broader adoption and extension to additional CWEs and languages.
Abstract
Although Large Language Models (LLMs) show promising solutions to automated code generation, they often produce insecure code that threatens software security. Current approaches (e.g., SafeCoder) to improve secure code generation are limited by small, imbalanced instruction-tuning datasets. In this work, we present Secure-Instruct, a novel pipeline that automatically synthesizes high-quality vulnerable and secure code examples and instruction-tunes LLMs to align task description and secure code generation abilities. We evaluate Secure-Instruct on four representative LLMs using two security-related benchmarks: our own CWEBench and the existing CWEval. CWEBench comprises 93 scenarios on 44 CWEs, all without overlap with Secure-Instruct's synthetic instruction-tuning dataset, while CWEval covers 31 CWEs with 119 manually verified security-critical tasks. We find that Secure-Instruct improves both security and functional correctness in code generation. On CWEBench, Secure-Instruct substantially improves secure code generation, giving a 28.5% increase on average in secure ratio over the pre-trained models and outperforms SafeCoder by 12.6%. On CWEval, Secure-Instruct achieves an increase of 157.3% for CodeLlama-7B and 46.4% for Mistral-7B in Func-Sec@1 over pretrained models, and significantly outperforms SafeCoder.
