Table of Contents
Fetching ...

Call Graph Soundness in Android Static Analysis

Jordan Samhi, René Just, Tegawendé F. Bissyandé, Michael D. Ernst, Jacques Klein

TL;DR

This work investigates why Android static analysis is often unsound by quantifying the omissions in call-graph construction. By compiling and comparing static call graphs from 13 analyzers against a dynamic ground truth across 1000 real apps, the authors show that a substantial fraction of runtime methods are missed, with only $58 ext{ extperthousand}$ of apps analyzable within a $1$-hour limit and significant variance across tools. The study reveals a counterintuitive trade-off: more precise call-graph construction consistently increases unsoundness, largely due to unmodeled entry-point methods and implicit framework mechanisms. The results argue for a shift in static-analysis research toward improving soundness—particularly around entry points and framework callbacks—and provide public artifacts to facilitate reproducibility and further work.

Abstract

Static analysis is sound in theory, but an implementation may unsoundly fail to analyze all of a program's code. Any such omission is a serious threat to the validity of the tool's output. Our work is the first to measure the prevalence of these omissions. Previously, researchers and analysts did not know what is missed by static analysis, what sort of code is missed, or the reasons behind these omissions. To address this gap, we ran 13 static analysis tools and a dynamic analysis on 1000 Android apps. Any method in the dynamic analysis but not in a static analysis is an unsoundness. Our findings include the following. (1) Apps built around external frameworks challenge static analyzers. On average, the 13 static analysis tools failed to capture 61% of the dynamically-executed methods. (2) A high level of precision in call graph construction is a synonym for a high level of unsoundness; (3) No existing approach significantly improves static analysis soundness. This includes those specifically tailored for a given mechanism, such as DroidRA to address reflection. It also includes systematic approaches, such as EdgeMiner, capturing all callbacks in the Android framework systematically. (4) Modeling entry point methods challenges call graph construction which jeopardizes soundness.

Call Graph Soundness in Android Static Analysis

TL;DR

This work investigates why Android static analysis is often unsound by quantifying the omissions in call-graph construction. By compiling and comparing static call graphs from 13 analyzers against a dynamic ground truth across 1000 real apps, the authors show that a substantial fraction of runtime methods are missed, with only of apps analyzable within a -hour limit and significant variance across tools. The study reveals a counterintuitive trade-off: more precise call-graph construction consistently increases unsoundness, largely due to unmodeled entry-point methods and implicit framework mechanisms. The results argue for a shift in static-analysis research toward improving soundness—particularly around entry points and framework callbacks—and provide public artifacts to facilitate reproducibility and further work.

Abstract

Static analysis is sound in theory, but an implementation may unsoundly fail to analyze all of a program's code. Any such omission is a serious threat to the validity of the tool's output. Our work is the first to measure the prevalence of these omissions. Previously, researchers and analysts did not know what is missed by static analysis, what sort of code is missed, or the reasons behind these omissions. To address this gap, we ran 13 static analysis tools and a dynamic analysis on 1000 Android apps. Any method in the dynamic analysis but not in a static analysis is an unsoundness. Our findings include the following. (1) Apps built around external frameworks challenge static analyzers. On average, the 13 static analysis tools failed to capture 61% of the dynamically-executed methods. (2) A high level of precision in call graph construction is a synonym for a high level of unsoundness; (3) No existing approach significantly improves static analysis soundness. This includes those specifically tailored for a given mechanism, such as DroidRA to address reflection. It also includes systematic approaches, such as EdgeMiner, capturing all callbacks in the Android framework systematically. (4) Modeling entry point methods challenges call graph construction which jeopardizes soundness.
Paper Structure (20 sections, 9 figures, 8 tables)

This paper contains 20 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: The number of methods called at run time.
  • Figure 2: Number of apps successfully analyzed per tool
  • Figure 3: Comparison of recall, precision, and f$_{1}$ score of all configurations of tools and call graph construction algorithms.
  • Figure 4: Proportion of dynamically-executed methods missed by FlowDroid-CHA.
  • Figure 5: Code Coverage of the dynamic analysis for the 126 apps successfully analyzed by all tools. The code coverage is at the method level and is expressed in %.
  • ...and 4 more figures