Table of Contents
Fetching ...

SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models

Scott Thornton

TL;DR

SecureCode v2.0 addresses the security gap in AI-generated code by grounding training data in real incidents and production workflows. It introduces a 4-turn conversational format with vulnerable/secure code, attack demonstrations, and defense-in-depth guidance. The dataset spans 11 languages and 11 vulnerability categories, with automated validation and language-specific fidelity, and is open-sourced. The work demonstrates that incident-grounded, production-focused training data can improve security outcomes for AI-assisted development.

Abstract

AI assistants produce vulnerable code in 45% of security-relevant scenarios, introducing flaws into production systems at scale. Yet existing secure coding datasets fall short. They lack incident grounding, don't provide the scale modern training requires, and miss the operational security context developers need for production deployments. We present SecureCode v2.0, a production-grade dataset of 1,215 security-focused coding examples that passed structural validation and expert security review. Every example ties to actual documented security incidents with CVE references, provides vulnerable and secure implementations, demonstrates concrete attacks, and includes defense-in-depth operational guidance. The dataset covers 11 vulnerability categories (complete OWASP Top 10:2025 plus AI/ML Security Threats) across 11 languages (Python, JavaScript, Java, Go, PHP, C#, TypeScript, Ruby, Rust, Kotlin, and YAML for infrastructure-as-code). Our quality assurance framework ensures complete incident grounding. Each example includes SIEM integration strategies, infrastructure hardening recommendations (Docker, AppArmor, WAF configurations), and testing approaches using language-appropriate frameworks. The dataset uses a 4-turn conversational structure mirroring actual developer-AI interactions, escalating from basic implementations to advanced security considerations and defense-in-depth guidance. Our contributions: (1) 1,215 rigorously validated examples split into 989 training, 122 validation, and 104 test sets, (2) an automated validation framework ensuring dataset consistency, (3) a 4-turn conversational structure capturing realistic security workflows, (4) comprehensive operational security guidance with SIEM integration strategies, (5) complete language-specific implementation fidelity, and (6) open-source release of data, validation tools, and benchmarking protocols.

SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models

TL;DR

SecureCode v2.0 addresses the security gap in AI-generated code by grounding training data in real incidents and production workflows. It introduces a 4-turn conversational format with vulnerable/secure code, attack demonstrations, and defense-in-depth guidance. The dataset spans 11 languages and 11 vulnerability categories, with automated validation and language-specific fidelity, and is open-sourced. The work demonstrates that incident-grounded, production-focused training data can improve security outcomes for AI-assisted development.

Abstract

AI assistants produce vulnerable code in 45% of security-relevant scenarios, introducing flaws into production systems at scale. Yet existing secure coding datasets fall short. They lack incident grounding, don't provide the scale modern training requires, and miss the operational security context developers need for production deployments. We present SecureCode v2.0, a production-grade dataset of 1,215 security-focused coding examples that passed structural validation and expert security review. Every example ties to actual documented security incidents with CVE references, provides vulnerable and secure implementations, demonstrates concrete attacks, and includes defense-in-depth operational guidance. The dataset covers 11 vulnerability categories (complete OWASP Top 10:2025 plus AI/ML Security Threats) across 11 languages (Python, JavaScript, Java, Go, PHP, C#, TypeScript, Ruby, Rust, Kotlin, and YAML for infrastructure-as-code). Our quality assurance framework ensures complete incident grounding. Each example includes SIEM integration strategies, infrastructure hardening recommendations (Docker, AppArmor, WAF configurations), and testing approaches using language-appropriate frameworks. The dataset uses a 4-turn conversational structure mirroring actual developer-AI interactions, escalating from basic implementations to advanced security considerations and defense-in-depth guidance. Our contributions: (1) 1,215 rigorously validated examples split into 989 training, 122 validation, and 104 test sets, (2) an automated validation framework ensuring dataset consistency, (3) a 4-turn conversational structure capturing realistic security workflows, (4) comprehensive operational security guidance with SIEM integration strategies, (5) complete language-specific implementation fidelity, and (6) open-source release of data, validation tools, and benchmarking protocols.

Paper Structure

This paper contains 22 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Four-Turn Conversational Format. Mirrors realistic developer-AI workflows. Turn 1: feature requests. Turn 2: vulnerable + secure implementations with attacks. Turn 3: advanced scenarios. Turn 4: operational security guidance.
  • Figure 2: Coverage Snapshot. Dataset composition across three dimensions: vulnerability categories (left), language distribution (center), and severity mix (right). Dataset splits: 989 train / 122 validation / 104 test examples.
  • Figure 3: Dataset Comparison. SecureCode v2.0 vs. related work across four dimensions: dataset size (blue), language coverage (pink), incident grounding (orange), and conversational format (green). SecureCode v2.0 achieves 100% incident grounding and is the only conversational dataset.
  • Figure 4: Dataset Construction Pipeline. Five-stage progression from 2,847 incident candidates to 1,215 final examples. Verification gates ensure no CVE overlap across splits, no near-duplicate pairs (Jaccard > 0.8), and preserved split group integrity.
  • Figure 5: Weekly Compliance Progress. Six-week improvement from 47.2% to 100% compliance. Blue line shows compliance rate; pink bars show weekly fixes (679 total). Week 1 required most remediation (312 CVE format fixes).