PIP: Making Andersen's Points-to Analysis Sound and Practical for Incomplete C Programs
Håvard Rognebakke Krogstie, Helge Bahmann, Magnus Själander, Nico Reissmann
TL;DR
This work tackles the difficulty of obtaining sound points-to information for incomplete C programs by introducing an Andersen-style analysis that tracks externally accessible memory. It replaces explicit pointee tracking with an implicit representation (the Ω concept) and further refines solving with the Prefer Implicit Pointees (PIP) technique, delivering large-speedups and meaningful precision gains. The method is implemented in the jlm compiler and validated on thousands of C files, achieving about $1.1$ ms per file on average and a $40\%$ reduction in MayAlias relative to LLVM BasicAA, making production-worthy per-file analysis practical. Together, these contributions enable sound, scalable, and precise points-to analysis for incomplete programs, addressing a long-standing gap in production compiler workflows.
Abstract
Compiling files individually lends itself well to parallelization, but forces the compiler to operate on incomplete programs. State-of-the-art points-to analyses guarantee sound solutions only for complete programs, requiring summary functions to describe any missing program parts. Summary functions are rarely available in production compilers, however, where soundness and efficiency are non-negotiable. This paper presents an Andersen-style points-to analysis that efficiently produces sound solutions for incomplete C programs. The analysis accomplishes soundness by tracking memory locations and pointers that are accessible from external modules, and efficiency by performing this tracking implicitly in the constraint graph. We show that implicit pointee tracking makes the constraint solver 15$\times$ faster than any combination of five different state-of-the-art techniques using explicit pointee tracking. We also present the Prefer Implicit Pointees (PIP) technique that further reduces the use of explicit pointees. PIP gives an additional speedup of 1.9$\times$, compared to the fastest solver configuration not benefiting from PIP. The precision of the analysis is evaluated in terms of an alias-analysis client, where it reduces the number of MayAlias-responses by 40% compared to LLVM's BasicAA pass alone. Finally, we show that the analysis is scalable in terms of memory, making it suitable for optimizing compilers in practice.
