Table of Contents
Fetching ...

PIP: Making Andersen's Points-to Analysis Sound and Practical for Incomplete C Programs

Håvard Rognebakke Krogstie, Helge Bahmann, Magnus Själander, Nico Reissmann

TL;DR

This work tackles the difficulty of obtaining sound points-to information for incomplete C programs by introducing an Andersen-style analysis that tracks externally accessible memory. It replaces explicit pointee tracking with an implicit representation (the Ω concept) and further refines solving with the Prefer Implicit Pointees (PIP) technique, delivering large-speedups and meaningful precision gains. The method is implemented in the jlm compiler and validated on thousands of C files, achieving about $1.1$ ms per file on average and a $40\%$ reduction in MayAlias relative to LLVM BasicAA, making production-worthy per-file analysis practical. Together, these contributions enable sound, scalable, and precise points-to analysis for incomplete programs, addressing a long-standing gap in production compiler workflows.

Abstract

Compiling files individually lends itself well to parallelization, but forces the compiler to operate on incomplete programs. State-of-the-art points-to analyses guarantee sound solutions only for complete programs, requiring summary functions to describe any missing program parts. Summary functions are rarely available in production compilers, however, where soundness and efficiency are non-negotiable. This paper presents an Andersen-style points-to analysis that efficiently produces sound solutions for incomplete C programs. The analysis accomplishes soundness by tracking memory locations and pointers that are accessible from external modules, and efficiency by performing this tracking implicitly in the constraint graph. We show that implicit pointee tracking makes the constraint solver 15$\times$ faster than any combination of five different state-of-the-art techniques using explicit pointee tracking. We also present the Prefer Implicit Pointees (PIP) technique that further reduces the use of explicit pointees. PIP gives an additional speedup of 1.9$\times$, compared to the fastest solver configuration not benefiting from PIP. The precision of the analysis is evaluated in terms of an alias-analysis client, where it reduces the number of MayAlias-responses by 40% compared to LLVM's BasicAA pass alone. Finally, we show that the analysis is scalable in terms of memory, making it suitable for optimizing compilers in practice.

PIP: Making Andersen's Points-to Analysis Sound and Practical for Incomplete C Programs

TL;DR

This work tackles the difficulty of obtaining sound points-to information for incomplete C programs by introducing an Andersen-style analysis that tracks externally accessible memory. It replaces explicit pointee tracking with an implicit representation (the Ω concept) and further refines solving with the Prefer Implicit Pointees (PIP) technique, delivering large-speedups and meaningful precision gains. The method is implemented in the jlm compiler and validated on thousands of C files, achieving about ms per file on average and a reduction in MayAlias relative to LLVM BasicAA, making production-worthy per-file analysis practical. Together, these contributions enable sound, scalable, and precise points-to analysis for incomplete programs, addressing a long-standing gap in production compiler workflows.

Abstract

Compiling files individually lends itself well to parallelization, but forces the compiler to operate on incomplete programs. State-of-the-art points-to analyses guarantee sound solutions only for complete programs, requiring summary functions to describe any missing program parts. Summary functions are rarely available in production compilers, however, where soundness and efficiency are non-negotiable. This paper presents an Andersen-style points-to analysis that efficiently produces sound solutions for incomplete C programs. The analysis accomplishes soundness by tracking memory locations and pointers that are accessible from external modules, and efficiency by performing this tracking implicitly in the constraint graph. We show that implicit pointee tracking makes the constraint solver 15 faster than any combination of five different state-of-the-art techniques using explicit pointee tracking. We also present the Prefer Implicit Pointees (PIP) technique that further reduces the use of explicit pointees. PIP gives an additional speedup of 1.9, compared to the fastest solver configuration not benefiting from PIP. The precision of the analysis is evaluated in terms of an alias-analysis client, where it reduces the number of MayAlias-responses by 40% compared to LLVM's BasicAA pass alone. Finally, we show that the analysis is scalable in terms of memory, making it suitable for optimizing compilers in practice.

Paper Structure

This paper contains 31 sections, 8 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Example of an incomplete program with pointers p, q, and r, all of which may point to unknown targets from external modules.
  • Figure 2: Rules of inference for sound points-to set tracking
  • Figure 3: A set of example constraints, and the corresponding constraint graph. Note that $y \notin P$, so there is no $\mathop{\mathrm{Sol}}\nolimits(y)$.
  • Figure 4: The solved version of the constraint graph from \ref{['fig:constraint-graph-simple']}. The inferred constraints are listed on the left, and colored blue in the graph.
  • Figure 5: A sample C program containing two functions and an indirect function call. The corresponding constraint graph is drawn in black. The result of applying the call inference rule to $\mathop{\mathrm{Call}}\nolimits_1$ and $\mathop{\mathrm{Func}}\nolimits_2$ is drawn in blue. Local variables that never have their address taken are represented by virtual registers, drawn as circles. The remaining constraint variables are abstract memory locations, drawn as squares.
  • ...and 5 more figures