VULSOLVER: Vulnerability Detection via LLM-Driven Constraint Solving
Xiang Li, Yueci Su, Jiahao Liu, Zhiwei Lin, Yuebing Hou, Peiming Gao, Yuanchao Zhang
TL;DR
The paper tackles the reliability and scalability challenges of vulnerability detection in large codebases by reframing detection as a path-based constraint-solving problem. It introduces VULSOLVER, which integrates static analysis with LLM-driven semantic reasoning and decomposes the task into subtasks along call paths using transfer and trigger constraints. Core contributions include formalizing transfer and trigger constraints, Branch Method Analysis, Context Maintenance, and Main Path Analysis, plus a language-agnostic Code Information Summary generation and a deterministic solving pipeline. Empirical results on the OWASP Benchmark yield 97.85% accuracy and 100% recall, while real-world open-source testing uncovers 15 new high-severity vulnerabilities, demonstrating strong practical value and robustness over existing approaches.
Abstract
Traditional vulnerability detection methods rely heavily on predefined rule matching, which often fails to capture vulnerabilities accurately. With the rise of large language models (LLMs), leveraging their ability to understand code semantics has emerged as a promising direction for achieving more accurate and efficient vulnerability detection. However, current LLM-based approaches face significant challenges: instability in model outputs, degraded performance with long context, and hallucination. As a result, many existing solutions either use LLMs merely to enrich predefined rule sets, thereby keeping the detection process fundamentally rule-based, or over-rely on them, leading to poor robustness. To address these challenges, we propose a constraint-solving approach powered by LLMs named VULSOLVER. By modeling vulnerability detection as a constraint-solving problem, and by integrating static application security testing (SAST) with the semantic reasoning capabilities of LLMs, our method enables the LLM to act like a professional human security expert. We assess VULSOLVER on the OWASP Benchmark (1,023 labeled samples), achieving 97.85% accuracy, 97.97% F1-score, and 100% recall. Applied to widely-used open-source projects, VULSOLVER identified 15 previously unknown high-severity vulnerabilities (CVSS 7.5-9.8), demonstrating its effectiveness in real-world security analysis.
