Scaling Symbolic Execution to Large Software Systems
Gabor Horvath, Reka Kovacs, Zoltan Porkolab
TL;DR
This paper addresses scaling symbolic execution to large software systems by integrating it with the Clang Static Analyzer and CodeChecker to achieve end-to-end scalability. It presents a holistic pipeline that combines memory modeling, context-sensitive inlining, and diverse exploration strategies with a two-pass cross-translation-unit architecture (CTU) to extend analysis across translation units, aided by caching and AST merging. It also explores summary-based analyses, loop modeling, and a suite of checks, plus refutation using a more powerful solver (Z3) to reduce false positives, albeit with runtime tradeoffs. The work demonstrates practical implications for industrial-scale code bases through CodeChecker’s tooling, two-pass CTU workflows, and differential analysis, highlighting tradeoffs between precision, performance, and developer usability. Together, these contributions advance scalable static analysis by enabling deeper interprocedural reasoning while maintaining acceptable resource usage in real-world projects.
Abstract
Static analysis is the analysis of a program without executing it, usually carried out by an automated tool. Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software. It works by interpreting the code, introducing a symbol for each value unknown at compile time (e.g. user-given inputs), and carrying out calculations symbolically. The analysis engine strives to explore multiple execution paths simultaneously, although checking all paths is an intractable problem, due to the vast number of possibilities. We focus on an error finding framework called the Clang Static Analyzer, and the infrastructure built around it named CodeChecker. The emphasis is on achieving end-to-end scalability. This includes the run time and memory consumption of the analysis, bug presentation to the users, automatic false positive suppression, incremental analysis, pattern discovery in the results, and usage in continuous integration loops. We also outline future directions and open problems concerning these tools. While a rich literature exists on program verification software, error finding tools normally need to settle for survey papers on individual techniques. In this paper, we not only discuss individual methods, but also how these decisions interact and reinforce each other, creating a system that is greater than the sum of its parts. Although the Clang Static Analyzer can only handle C-family languages, the techniques introduced in this paper are mostly language-independent and applicable to other similar static analysis tools.
