SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning
Fang Liu, Simiao Liu, Yinghao Zhu, Xiaoli Lian, Li Zhang
TL;DR
SecureReviewer targets the automated secure code review gap by constructing a security-focused dataset, applying secure-aware fine-tuning with a weighted loss, and grounding generation via retrieval-augmented generation. The approach yields state-of-the-art results in both security-issue detection and the quality of security-oriented review comments, validated through automatic metrics and human evaluation. Key contributions include a scalable data pipeline with GPT-4o-assisted refinement, a novel SA loss formulation, and a SecureBLEU evaluation metric that blends linguistic quality with security relevance. The work demonstrates practical improvements for secure software development and highlights the value of domain adaptation and retrieval grounding in LLM-based code review.
Abstract
Identifying and addressing security issues during the early phase of the development lifecycle is critical for mitigating the long-term negative impacts on software systems. Code review serves as an effective practice that enables developers to check their teammates' code before integration into the codebase. To streamline the generation of review comments, various automated code review approaches have been proposed, where LLM-based methods have significantly advanced the capabilities of automated review generation. However, existing models primarily focus on general-purpose code review, their effectiveness in identifying and addressing security-related issues remains underexplored. Moreover, adapting existing code review approaches to target security issues faces substantial challenges, including data scarcity and inadequate evaluation metrics. To address these limitations, we propose SecureReviewer, a new approach designed for enhancing LLMs' ability to identify and resolve security-related issues during code review. Specifically, we first construct a dataset tailored for training and evaluating secure code review capabilities. Leveraging this dataset, we fine-tune LLMs to generate code review comments that can effectively identify security issues and provide fix suggestions with our proposed secure-aware fine-tuning strategy. To mitigate hallucination in LLMs and enhance the reliability of their outputs, we integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge. Additionally, we introduce SecureBLEU, a new evaluation metric designed to assess the effectiveness of review comments in addressing security issues. Experimental results demonstrate that SecureReviewer outperforms state-of-the-art baselines in both security issue detection accuracy and the overall quality and practical utility of generated review comments.
