SecCodePRM: A Process Reward Model for Code Security

Weichen Yu; Ravi Mangal; Yinyi Luo; Kai Hu; Jingxuan He; Corina S. Pasareanu; Matt Fredrikson

SecCodePRM: A Process Reward Model for Code Security

Weichen Yu, Ravi Mangal, Yinyi Luo, Kai Hu, Jingxuan He, Corina S. Pasareanu, Matt Fredrikson

TL;DR

SecCodePRM is proposed, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory that outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.

Abstract

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.

SecCodePRM: A Process Reward Model for Code Security

TL;DR

Abstract

Paper Structure (31 sections, 11 equations, 10 figures, 12 tables)

This paper contains 31 sections, 11 equations, 10 figures, 12 tables.

Introduction
SecCodePRM's Diverse Applications.
Related Work
SecCodePRM
The Semantic Gap Between Human Expertise and Automated Vulnerability Detection
Problem Setup
Secure Code Process Reward Modeling
Step-level Margin Scoring.
Reweighted Aggregation for Detection.
Dataset Construction and Refinement
Training Datasets Analysis
Experiments
Vulnerability Detection on Full Code Samples
Vulnerability Detection on Partial Code Samples
Secure Code Generation with Inference Time Scaling
...and 16 more sections

Figures (10)

Figure 1: Performance comparison on vulnerability detection (VD) and code generation (CG). SecCodePRM consistently outperforms prior approaches across three core settings: VD on complete programs, VD on partial code prefixes, and safety-guided CG, compared to off-the-shelf LLMs, PRMs, and SOTA VD methods, and without the expense of general CG tradeoff.
Figure 2: Human vs. Automated Vulnerability Detection. Left: A CWE example, we use labels on the vulnerability code, and their transitive closure of caller functions. Right: Box plot showing expert humans identify flaws using only $\sim 60\%$ of code tokens from the beginning. Flow analysis usually fails on partial code and requires extra information.
Figure 3: Pipeline Comparison.
Figure 4: Dataset construction, cleaning, labeling pipeline.
Figure 5: Token length and accuracy distribution on PrimeVul.
...and 5 more figures

SecCodePRM: A Process Reward Model for Code Security

TL;DR

Abstract

SecCodePRM: A Process Reward Model for Code Security

Authors

TL;DR

Abstract

Table of Contents

Figures (10)