QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik
TL;DR
QLCoder tackles the challenge of automatically generating precise static-analysis queries from CVE descriptions by embedding an LLM in a structured, execution-guided synthesis loop constrained by a Model Context Protocol. It combines a retrieval-augmented, domain-aware context with a CodeQL Language Server for syntax guidance to produce syntactically valid and semantically precise queries that discriminate vulnerable from patched code. In extensive evaluation on CWE-Bench-Java (176 CVEs across 111 Java projects), QLCoder achieves full query compilation and a QL-based success rate of $53.4$ percent, substantially outperforming baselines such as Gemini CLI and IRIS, with an average F1 around $0.70$. The work demonstrates strong potential for regression testing, variant analysis, and patch validation by bridging vulnerability reports to executable CodeQL queries, while outlining limitations and avenues for broader language support and integration with dynamic analysis.
Abstract
Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects. Building upon the Claude Code agent framework, QLCoder synthesizes correct queries that detect the CVE in the vulnerable but not in the patched versions for 53.4% of CVEs. In comparison, using only Claude Code synthesizes 10% correct queries.
