Table of Contents
Fetching ...

LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis

Chengpeng Wang, Yifei Gao, Wuqi Zhang, Xuwei Liu, Qingkai Shi, Xiangyu Zhang

TL;DR

LLMSA introduces a compilation-free, customizable static-analysis framework that combines neural and symbolic reasoning through a restricted Datalog-style analysis policy. By decomposing problems into sub-tasks and using parsing-based symbolic relations alongside neural relation specifications, LLMSA reduces LLM hallucinations via lazy, incremental, and parallel prompting, while maintaining a fixed-point evaluation over a worklist driven by a rule dependency graph. Empirical results across alias analysis, program slicing, and taint/bug-detection tasks demonstrate competitive precision and recall, with substantial speedups over ablations and strong performance on real-world datasets like TaintBench. The work highlights practical usability benefits, including low manual effort for customization and compilation-free applicability, though it acknowledges remaining challenges in hallucinations and scalability and outlines future directions for tighter integration with domain knowledge and lighter-weight models.

Abstract

Static analysis is essential for program optimization, bug detection, and debugging, but its reliance on compilation and limited customization hampers practical use. Advances in LLMs enable a new paradigm of compilation-free, customizable analysis via prompting. LLMs excel in interpreting program semantics on small code snippets and allow users to define analysis tasks in natural language with few-shot examples. However, misalignment with program semantics can cause hallucinations, especially in sophisticated semantic analysis upon lengthy code snippets. We propose LLMSA, a compositional neuro-symbolic approach for compilation-free, customizable static analysis with reduced hallucinations. Specifically, we propose an analysis policy language to support users decomposing an analysis problem into several sub-problems that target simple syntactic or semantic properties upon smaller code snippets. The problem decomposition enables the LLMs to target more manageable semantic-related sub-problems, while the syntactic ones are resolved by parsing-based analysis without hallucinations. An analysis policy is evaluated with lazy, incremental, and parallel prompting, which mitigates the hallucinations and improves the performance. It is shown that LLMSA achieves comparable and even superior performance to existing techniques in various clients. For instance, it attains 66.27% precision and 78.57% recall in taint vulnerability detection, surpassing an industrial approach in F1 score by 0.20.

LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis

TL;DR

LLMSA introduces a compilation-free, customizable static-analysis framework that combines neural and symbolic reasoning through a restricted Datalog-style analysis policy. By decomposing problems into sub-tasks and using parsing-based symbolic relations alongside neural relation specifications, LLMSA reduces LLM hallucinations via lazy, incremental, and parallel prompting, while maintaining a fixed-point evaluation over a worklist driven by a rule dependency graph. Empirical results across alias analysis, program slicing, and taint/bug-detection tasks demonstrate competitive precision and recall, with substantial speedups over ablations and strong performance on real-world datasets like TaintBench. The work highlights practical usability benefits, including low manual effort for customization and compilation-free applicability, though it acknowledges remaining challenges in hallucinations and scalability and outlines future directions for tighter integration with domain knowledge and lighter-weight models.

Abstract

Static analysis is essential for program optimization, bug detection, and debugging, but its reliance on compilation and limited customization hampers practical use. Advances in LLMs enable a new paradigm of compilation-free, customizable analysis via prompting. LLMs excel in interpreting program semantics on small code snippets and allow users to define analysis tasks in natural language with few-shot examples. However, misalignment with program semantics can cause hallucinations, especially in sophisticated semantic analysis upon lengthy code snippets. We propose LLMSA, a compositional neuro-symbolic approach for compilation-free, customizable static analysis with reduced hallucinations. Specifically, we propose an analysis policy language to support users decomposing an analysis problem into several sub-problems that target simple syntactic or semantic properties upon smaller code snippets. The problem decomposition enables the LLMs to target more manageable semantic-related sub-problems, while the syntactic ones are resolved by parsing-based analysis without hallucinations. An analysis policy is evaluated with lazy, incremental, and parallel prompting, which mitigates the hallucinations and improves the performance. It is shown that LLMSA achieves comparable and even superior performance to existing techniques in various clients. For instance, it attains 66.27% precision and 78.57% recall in taint vulnerability detection, surpassing an industrial approach in F1 score by 0.20.

Paper Structure

This paper contains 39 sections, 2 theorems, 6 equations, 11 figures, 7 tables.

Key Result

theorem 1

The sequential version of Algorithm alg:worklist_evaluation requires a minimal number of prompting rounds if each neural relation is populated by the same constrained neural constructor when evaluating different Datalog rules.

Figures (11)

  • Figure 1: Two motivating examples of compilation-free and customizable static analysis
  • Figure 2: The workflow of LLMSA
  • Figure 3: The examples of analysis policy and neural relation specification. In the sub-figure (a), the symbolic, neural, and intensional relations are in blue, red, and black, respectively.
  • Figure 4: The syntax of the analysis policy language
  • Figure 5: An analysis policy of intra-procedural XSS detection. The neural relations are in red.
  • ...and 6 more figures

Theorems & Definitions (22)

  • Example 3.1
  • Example 3.2
  • Definition 3.1
  • Example 3.3
  • Definition 4.1
  • Definition 4.2
  • Example 4.1
  • Definition 4.3
  • Example 4.2
  • Definition 4.4
  • ...and 12 more