Table of Contents
Fetching ...

Scaling Symbolic Execution to Large Software Systems

Gabor Horvath, Reka Kovacs, Zoltan Porkolab

TL;DR

This paper addresses scaling symbolic execution to large software systems by integrating it with the Clang Static Analyzer and CodeChecker to achieve end-to-end scalability. It presents a holistic pipeline that combines memory modeling, context-sensitive inlining, and diverse exploration strategies with a two-pass cross-translation-unit architecture (CTU) to extend analysis across translation units, aided by caching and AST merging. It also explores summary-based analyses, loop modeling, and a suite of checks, plus refutation using a more powerful solver (Z3) to reduce false positives, albeit with runtime tradeoffs. The work demonstrates practical implications for industrial-scale code bases through CodeChecker’s tooling, two-pass CTU workflows, and differential analysis, highlighting tradeoffs between precision, performance, and developer usability. Together, these contributions advance scalable static analysis by enabling deeper interprocedural reasoning while maintaining acceptable resource usage in real-world projects.

Abstract

Static analysis is the analysis of a program without executing it, usually carried out by an automated tool. Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software. It works by interpreting the code, introducing a symbol for each value unknown at compile time (e.g. user-given inputs), and carrying out calculations symbolically. The analysis engine strives to explore multiple execution paths simultaneously, although checking all paths is an intractable problem, due to the vast number of possibilities. We focus on an error finding framework called the Clang Static Analyzer, and the infrastructure built around it named CodeChecker. The emphasis is on achieving end-to-end scalability. This includes the run time and memory consumption of the analysis, bug presentation to the users, automatic false positive suppression, incremental analysis, pattern discovery in the results, and usage in continuous integration loops. We also outline future directions and open problems concerning these tools. While a rich literature exists on program verification software, error finding tools normally need to settle for survey papers on individual techniques. In this paper, we not only discuss individual methods, but also how these decisions interact and reinforce each other, creating a system that is greater than the sum of its parts. Although the Clang Static Analyzer can only handle C-family languages, the techniques introduced in this paper are mostly language-independent and applicable to other similar static analysis tools.

Scaling Symbolic Execution to Large Software Systems

TL;DR

This paper addresses scaling symbolic execution to large software systems by integrating it with the Clang Static Analyzer and CodeChecker to achieve end-to-end scalability. It presents a holistic pipeline that combines memory modeling, context-sensitive inlining, and diverse exploration strategies with a two-pass cross-translation-unit architecture (CTU) to extend analysis across translation units, aided by caching and AST merging. It also explores summary-based analyses, loop modeling, and a suite of checks, plus refutation using a more powerful solver (Z3) to reduce false positives, albeit with runtime tradeoffs. The work demonstrates practical implications for industrial-scale code bases through CodeChecker’s tooling, two-pass CTU workflows, and differential analysis, highlighting tradeoffs between precision, performance, and developer usability. Together, these contributions advance scalable static analysis by enabling deeper interprocedural reasoning while maintaining acceptable resource usage in real-world projects.

Abstract

Static analysis is the analysis of a program without executing it, usually carried out by an automated tool. Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software. It works by interpreting the code, introducing a symbol for each value unknown at compile time (e.g. user-given inputs), and carrying out calculations symbolically. The analysis engine strives to explore multiple execution paths simultaneously, although checking all paths is an intractable problem, due to the vast number of possibilities. We focus on an error finding framework called the Clang Static Analyzer, and the infrastructure built around it named CodeChecker. The emphasis is on achieving end-to-end scalability. This includes the run time and memory consumption of the analysis, bug presentation to the users, automatic false positive suppression, incremental analysis, pattern discovery in the results, and usage in continuous integration loops. We also outline future directions and open problems concerning these tools. While a rich literature exists on program verification software, error finding tools normally need to settle for survey papers on individual techniques. In this paper, we not only discuss individual methods, but also how these decisions interact and reinforce each other, creating a system that is greater than the sum of its parts. Although the Clang Static Analyzer can only handle C-family languages, the techniques introduced in this paper are mostly language-independent and applicable to other similar static analysis tools.
Paper Structure (46 sections, 18 figures, 3 tables)

This paper contains 46 sections, 18 figures, 3 tables.

Figures (18)

  • Figure 1: A C function and its simplified control flow graph.
  • Figure 2: A simplified version of the exploded graph built during the symbolic execution of a simple function. The right-hand side of the graph represents the true branch of the if statement, while the left-hand side describes the false branch.
  • Figure 3: Left: An expression representing field f of an object that is stored as the second element of array a on the stack. Right: The region hierarchy describing the memory layout of the expression on the left.
  • Figure 4: In the Clang Static Analyzer, an inlined function will not be re-analyzed as a top-level function. The analyzer only checks function f in the calling context of function g, thus it does not find the division by zero error in f.
  • Figure 5: An illustration of the depth-first search (DFS) exploration strategy.
  • ...and 13 more figures