One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE)
Xu Yang, Shaowei Wang, Jiayuan Zhou, Wenhan Zhu
TL;DR
This work tackles the enduring one-for-all limitation in deep-learning vulnerability detection by introducing MoEVD, a mixture-of-experts framework that partitions vulnerability detection by CWE type. By training CWE-specific experts and a CWE-type router, MoEVD achieves a notable F1 of 0.44 on BigVul, outperforming all baselines by at least 12.8% and delivering robust recall gains across both common and long-tailed CWE types. The approach explicitly decomposes the task into CWE-type classification and CWE-specific vulnerability detection, with an aggregation mechanism that selects the top-K experts for inference. The study demonstrates strong practical potential for real-world deployment, provides replication data, and highlights avenues for router improvements and advanced MoE architectures to further boost vulnerability detection performance.
Abstract
Deep Learning-based Vulnerability Detection (DLVD) techniques have garnered significant interest due to their ability to automatically learn vulnerability patterns from previously compromised code. Despite the notable accuracy demonstrated by pioneering tools, the broader application of DLVD methods in real-world scenarios is hindered by significant challenges. A primary issue is the "one-for-all" design, where a single model is trained to handle all types of vulnerabilities. This approach fails to capture the patterns of different vulnerability types, resulting in suboptimal performance, particularly for less common vulnerabilities that are often underrepresented in training datasets. To address these challenges, we propose MoEVD, which adopts the Mixture-of-Experts (MoE) framework for vulnerability detection. MoEVD decomposes vulnerability detection into two tasks, CWE type classification and CWE-specific vulnerability detection. By splitting the task, in vulnerability detection, MoEVD allows specific experts to handle distinct types of vulnerabilities instead of handling all vulnerabilities within one model. Our results show that MoEVD achieves an F1-score of 0.44, significantly outperforming all studied state-of-the-art (SOTA) baselines by at least 12.8%. MoEVD excels across almost all CWE types, improving recall over the best SOTA baseline by 9% to 77.8%. Notably, MoEVD does not sacrifice performance on long-tailed CWE types; instead, its MoE design enhances performance (F1-score) on these by at least 7.3%, addressing long-tailed issues effectively.
