From Generalist to Specialist: Exploring CWE-Specific Vulnerability Detection
Syafiq Al Atiiq, Christian Gehrmann, Kevin Dahlén, Karim Khalil
TL;DR
Vulnerability detection suffers from the heterogeneity of vulnerability types across CWEs and data imbalance, making single-label detectors prone to superficial cues. The authors propose CWE-specific classifiers and explore a multiclass integration to better capture CWE-specific code semantics, validated via an ablation study. CWE-specific models outperform a single binary detector on CWE-specific test sets but struggle to generalize across many CWEs due to false positives; a multiclass approach improves accuracy and F1 on targeted CWE subsets but retains precision challenges. The work provides practical insights for CWE-aware vulnerability detection and releases open-source resources to support reproducibility and future research.
Abstract
Vulnerability Detection (VD) using machine learning faces a significant challenge: the vast diversity of vulnerability types. Each Common Weakness Enumeration (CWE) represents a unique category of vulnerabilities with distinct characteristics, code semantics, and patterns. Treating all vulnerabilities as a single label with a binary classification approach may oversimplify the problem, as it fails to capture the nuances and context-specific to each CWE. As a result, a single binary classifier might merely rely on superficial text patterns rather than understanding the intricacies of each vulnerability type. Recent reports showed that even the state-of-the-art Large Language Model (LLM) with hundreds of billions of parameters struggles to generalize well to detect vulnerabilities. Our work investigates a different approach that leverages CWE-specific classifiers to address the heterogeneity of vulnerability types. We hypothesize that training separate classifiers for each CWE will enable the models to capture the unique characteristics and code semantics associated with each vulnerability category. To confirm this, we conduct an ablation study by training individual classifiers for each CWE and evaluating their performance independently. Our results demonstrate that CWE-specific classifiers outperform a single binary classifier trained on all vulnerabilities. Building upon this, we explore strategies to combine them into a unified vulnerability detection system using a multiclass approach. Even if the lack of large and high-quality datasets for vulnerability detection is still a major obstacle, our results show that multiclass detection can be a better path toward practical vulnerability detection in the future. All our models and code to produce our results are open-sourced.
