Table of Contents
Fetching ...

SecCodePRM: A Process Reward Model for Code Security

Weichen Yu, Ravi Mangal, Yinyi Luo, Kai Hu, Jingxuan He, Corina S. Pasareanu, Matt Fredrikson

TL;DR

SecCodePRM is proposed, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory that outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.

Abstract

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.

SecCodePRM: A Process Reward Model for Code Security

TL;DR

SecCodePRM is proposed, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory that outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.

Abstract

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.
Paper Structure (31 sections, 11 equations, 10 figures, 12 tables)

This paper contains 31 sections, 11 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Performance comparison on vulnerability detection (VD) and code generation (CG). SecCodePRM consistently outperforms prior approaches across three core settings: VD on complete programs, VD on partial code prefixes, and safety-guided CG, compared to off-the-shelf LLMs, PRMs, and SOTA VD methods, and without the expense of general CG tradeoff.
  • Figure 2: Human vs. Automated Vulnerability Detection. Left: A CWE example, we use labels on the vulnerability code, and their transitive closure of caller functions. Right: Box plot showing expert humans identify flaws using only $\sim 60\%$ of code tokens from the beginning. Flow analysis usually fails on partial code and requires extra information.
  • Figure 3: Pipeline Comparison.
  • Figure 4: Dataset construction, cleaning, labeling pipeline.
  • Figure 5: Token length and accuracy distribution on PrimeVul.
  • ...and 5 more figures