Table of Contents
Fetching ...

Unveiling the Power of Intermediate Representations for Static Analysis: A Survey

Bowen Zhang, Wei Chen, Hung-Chun Chiu, Charles Zhang

TL;DR

The paper critically surveys how Intermediate Representations (IR) influence static analysis, arguing that IR design—encompassing syntax, vocabulary, queries, and preprocessing—shapes versatility, performance, and productivity. It dissects IR syntaxes (AST, CFG, SSA, PDG) and equivalence/ dependence constructs (DDG, CDG, PDG) and discusses how domain-specific IRs (parallel, hardware, ML) extend vocabularies and abstractions via MLIR and dialects. The authors then review IR query mechanisms (IR-centered and analysis-centered) and preprocessing techniques (unification, simplification, reduction), offering practical guidance and a future research agenda. Overall, the survey provides a concrete manual for learners and practitioners and identifies significant opportunities for cross-language IR design, formalization, and tool support. The work emphasizes that careful IR design can unlock robust, scalable static analyses across diverse languages and domains.

Abstract

Static analysis techniques enhance the security, performance, and reliability of programs by analyzing and portraiting program behaviors without the need for actual execution. In essence, static analysis takes the Intermediate Representation (IR) of a target program as input to retrieve essential program information and understand the program. However, there is a lack of systematic analysis on the benefit of IR for static analysis, besides serving as an information provider. In general, a modern static analysis framework should possess the ability to conduct diverse analyses on different languages, producing reliable results with minimal time consumption, and offering extensive customization options. In this survey, we systematically characterize these goals and review the potential solutions from the perspective of IR. It can serve as a manual for learners and practitioners in the static analysis field to better understand IR design. Meanwhile, numerous research opportunities are revealed for researchers.

Unveiling the Power of Intermediate Representations for Static Analysis: A Survey

TL;DR

The paper critically surveys how Intermediate Representations (IR) influence static analysis, arguing that IR design—encompassing syntax, vocabulary, queries, and preprocessing—shapes versatility, performance, and productivity. It dissects IR syntaxes (AST, CFG, SSA, PDG) and equivalence/ dependence constructs (DDG, CDG, PDG) and discusses how domain-specific IRs (parallel, hardware, ML) extend vocabularies and abstractions via MLIR and dialects. The authors then review IR query mechanisms (IR-centered and analysis-centered) and preprocessing techniques (unification, simplification, reduction), offering practical guidance and a future research agenda. Overall, the survey provides a concrete manual for learners and practitioners and identifies significant opportunities for cross-language IR design, formalization, and tool support. The work emphasizes that careful IR design can unlock robust, scalable static analyses across diverse languages and domains.

Abstract

Static analysis techniques enhance the security, performance, and reliability of programs by analyzing and portraiting program behaviors without the need for actual execution. In essence, static analysis takes the Intermediate Representation (IR) of a target program as input to retrieve essential program information and understand the program. However, there is a lack of systematic analysis on the benefit of IR for static analysis, besides serving as an information provider. In general, a modern static analysis framework should possess the ability to conduct diverse analyses on different languages, producing reliable results with minimal time consumption, and offering extensive customization options. In this survey, we systematically characterize these goals and review the potential solutions from the perspective of IR. It can serve as a manual for learners and practitioners in the static analysis field to better understand IR design. Meanwhile, numerous research opportunities are revealed for researchers.
Paper Structure (38 sections, 2 equations, 8 figures, 6 tables)

This paper contains 38 sections, 2 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Program Concepts
  • Figure 2: A code example of calculating Fibonacci
  • Figure 3: The AST, CFG, and SSA corresponding to the code in Fig. \ref{['fig:fib']}
  • Figure 4: The DDG for code in Fig. \ref{['fig:fib']}
  • Figure 5: The CDG for code in Fig. \ref{['fig:fib']}
  • ...and 3 more figures

Theorems & Definitions (14)

  • Definition 1: Syntactic Nesting Relation
  • Definition 2: Abstract Syntax Tree
  • Definition 3: Control Flow Order
  • Definition 4: Statement-level Control Flow Graph
  • Definition 5: Control Flow Graph
  • Definition 6: Inter-procedural Control Flow Graph
  • Definition 7: Syntactic Equivalence
  • Definition 8: Value Equivalence
  • Definition 9: Static Single Assignment Form
  • Definition 10: Data Dependence
  • ...and 4 more