Table of Contents
Fetching ...

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik

TL;DR

QLCoder tackles the challenge of automatically generating precise static-analysis queries from CVE descriptions by embedding an LLM in a structured, execution-guided synthesis loop constrained by a Model Context Protocol. It combines a retrieval-augmented, domain-aware context with a CodeQL Language Server for syntax guidance to produce syntactically valid and semantically precise queries that discriminate vulnerable from patched code. In extensive evaluation on CWE-Bench-Java (176 CVEs across 111 Java projects), QLCoder achieves full query compilation and a QL-based success rate of $53.4$ percent, substantially outperforming baselines such as Gemini CLI and IRIS, with an average F1 around $0.70$. The work demonstrates strong potential for regression testing, variant analysis, and patch validation by bridging vulnerability reports to executable CodeQL queries, while outlining limitations and avenues for broader language support and integration with dynamic analysis.

Abstract

Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects. Building upon the Claude Code agent framework, QLCoder synthesizes correct queries that detect the CVE in the vulnerable but not in the patched versions for 53.4% of CVEs. In comparison, using only Claude Code synthesizes 10% correct queries.

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

TL;DR

QLCoder tackles the challenge of automatically generating precise static-analysis queries from CVE descriptions by embedding an LLM in a structured, execution-guided synthesis loop constrained by a Model Context Protocol. It combines a retrieval-augmented, domain-aware context with a CodeQL Language Server for syntax guidance to produce syntactically valid and semantically precise queries that discriminate vulnerable from patched code. In extensive evaluation on CWE-Bench-Java (176 CVEs across 111 Java projects), QLCoder achieves full query compilation and a QL-based success rate of percent, substantially outperforming baselines such as Gemini CLI and IRIS, with an average F1 around . The work demonstrates strong potential for regression testing, variant analysis, and patch validation by bridging vulnerability reports to executable CodeQL queries, while outlining limitations and avenues for broader language support and integration with dynamic analysis.

Abstract

Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects. Building upon the Claude Code agent framework, QLCoder synthesizes correct queries that detect the CVE in the vulnerable but not in the patched versions for 53.4% of CVEs. In comparison, using only Claude Code synthesizes 10% correct queries.

Paper Structure

This paper contains 33 sections, 1 equation, 5 figures, 6 tables.

Figures (5)

  • Figure 1: A CodeQL query capturing a vulnerability pattern is synthesized by QLCoder from an existing CVE and subsequently reused for regression testing, variant analysis, or patch validation.
  • Figure 2: Illustration of vulnerability CVE-2025-27136 in repository Robothy/local-s3 which exhibits an XML External Entity Injection weakness (CWE-611). When the XmlMapper is not configured to disable Document Type Definition (DTD), the function readValue may declare additional entities, allowing hackers to inject malicious behavior.
  • Figure 3: Overall pipeline of QLCoder's iterative synthesis loop between an agentic query generator and a CodeQL-based validator. The generator uses a vector database and our CodeQL Language Server as tools while the validator produces compilation, execution, and coverage feedback.
  • Figure 4: Illustration of example traces of conversation during the synthesis of the query in the motivating example (Figure \ref{['fig:main-example']}). LLM-agent may think, invoke tools that are available in the toolbox, and receive responses from the MCP servers.
  • Figure 5: Recall Rate Comparison by CWE Type Across Different Methods (102 CVEs).