Table of Contents
Fetching ...

Boosting Path-Sensitive Value Flow Analysis via Removal of Redundant Summaries

Yongchao Wang, Yuandao Cai, Charles Zhang

TL;DR

This work tackles the scalability challenge of path-sensitive value-flow analysis, where the bottom-up collection and cloning of function summaries cause explosive growth in memory and time. It introduces contribution identification (CI) based on contribution abstraction to prune non-contributing summaries, using graph reachability (via BFS) to identify necessary heads, tails, and guard vertices without sacrificing soundness. Empirical evaluation on 17 large programs shows CI reduces overall time by about $45%$ and memory by about $27%$, identifying about $79%$ of redundant summaries on average; in the largest mysqld case, it saves $8107$ seconds with only $17.31$ seconds of overhead, achieving an average performance gain of $632.1x$. The approach preserves precision, outperforms purely top-down methods in scalability, and is orthogonal to other optimizations, making path-sensitive value-flow analysis more practical for large-scale software.

Abstract

Value flow analysis that tracks the flow of values via data dependence is a widely used technique for detecting a broad spectrum of software bugs. However, the scalability issue often deteriorates when high precision (i.e., path-sensitivity) is required, as the instantiation of function summaries becomes excessively time- and memory-intensive. The primary culprit, as we observe, is the existence of redundant computations resulting from blindly computing summaries for a function, irrespective of whether they are related to bugs being checked. To address this problem, we present the first approach that can effectively identify and eliminate redundant summaries, thereby reducing the size of collected summaries from callee functions without compromising soundness or efficiency. Our evaluation on large programs demonstrates that our identification algorithm can significantly reduce the time and memory overhead of the state-of-the-art value flow analysis by 45\% and 27\%, respectively. Furthermore, the identification algorithm demonstrates remarkable efficiency by identifying nearly 80\% of redundant summaries while incurring a minimal additional overhead. In the largest \textit{mysqld} project, the identification algorithm reduces the time by 8107 seconds (2.25 hours) with a mere 17.31 seconds of additional overhead, leading to a ratio of time savings to paid overhead (i.e., performance gain) of 468.48 $\times$. In total, our method attains an average performance gain of 632.1 $\times$.

Boosting Path-Sensitive Value Flow Analysis via Removal of Redundant Summaries

TL;DR

This work tackles the scalability challenge of path-sensitive value-flow analysis, where the bottom-up collection and cloning of function summaries cause explosive growth in memory and time. It introduces contribution identification (CI) based on contribution abstraction to prune non-contributing summaries, using graph reachability (via BFS) to identify necessary heads, tails, and guard vertices without sacrificing soundness. Empirical evaluation on 17 large programs shows CI reduces overall time by about and memory by about , identifying about of redundant summaries on average; in the largest mysqld case, it saves seconds with only seconds of overhead, achieving an average performance gain of . The approach preserves precision, outperforms purely top-down methods in scalability, and is orthogonal to other optimizations, making path-sensitive value-flow analysis more practical for large-scale software.

Abstract

Value flow analysis that tracks the flow of values via data dependence is a widely used technique for detecting a broad spectrum of software bugs. However, the scalability issue often deteriorates when high precision (i.e., path-sensitivity) is required, as the instantiation of function summaries becomes excessively time- and memory-intensive. The primary culprit, as we observe, is the existence of redundant computations resulting from blindly computing summaries for a function, irrespective of whether they are related to bugs being checked. To address this problem, we present the first approach that can effectively identify and eliminate redundant summaries, thereby reducing the size of collected summaries from callee functions without compromising soundness or efficiency. Our evaluation on large programs demonstrates that our identification algorithm can significantly reduce the time and memory overhead of the state-of-the-art value flow analysis by 45\% and 27\%, respectively. Furthermore, the identification algorithm demonstrates remarkable efficiency by identifying nearly 80\% of redundant summaries while incurring a minimal additional overhead. In the largest \textit{mysqld} project, the identification algorithm reduces the time by 8107 seconds (2.25 hours) with a mere 17.31 seconds of additional overhead, leading to a ratio of time savings to paid overhead (i.e., performance gain) of 468.48 . In total, our method attains an average performance gain of 632.1 .

Paper Structure

This paper contains 22 sections, 2 theorems, 4 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1

Given the set $V^{N}$ identified, for any function $f \in P$, if a summary $s=(\pi, \phi)$ is collected and neither $\pi[0]$ nor $\pi[-1]$ appears in $V^{N}$, it must be a non-contributing summary for function $f$. Canceling the corresponding operations does not affect $S(V_{\text{src}}, V_{\text{si

Figures (4)

  • Figure 1: Bottom-up analysis for the code shown in (a). The (b) shows the corresponding program dependence graph (PDG). (c) shows the partial function summaries collected during the bottom-up analysis. Redundant summaries are highlighted in red.
  • Figure 2: Light-Fusion vs. its variants
  • Figure 3: Performance: Fusion vs. Light-Fusion vs. PhASAR.
  • Figure 4: An illustration example where the summary $\pi_{2}$ is removed due to the unsatisfiable summary condition $\phi_{\pi_{2}}$. Bottom-up analysis for the code shown in (a). The (b) shows the corresponding program dependence graph (PDG). (c) shows the function summaries collected during the bottom-up analysis.

Theorems & Definitions (13)

  • Definition 1
  • Example 1
  • Example 2
  • Definition 2: Function Summary
  • Example 3
  • Example 4
  • Definition 3: Contributing Summary
  • Example 5
  • Theorem 1: Soundness
  • Example 6
  • ...and 3 more