Table of Contents
Fetching ...

Context-Sensitive Abstract Interpretation of Dynamic Languages

Franciszek Piszcz

TL;DR

The thesis targets the tooling gap between static and dynamic languages by proposing a context-sensitive abstract interpretation framework for dynamic languages. It develops TinyScript, a minimal JavaScript subset, and its closure-converted intermediate form to enable precise static analysis of dynamic features such as metaprogramming and runtime reflection. The core contribution is a context-sensitive analysis with heap specialization that yields precise control/data-flow information, along with a practical prototype implemented in Rust. This work lays the groundwork for improving IDE features like navigation, autocompletion, and refactoring in dynamic languages, with a path toward scaling to real-world JavaScript and Python codebases through incremental, on-demand analysis. The approach demonstrates that advanced static analyses can bridge the gap between dynamism and tooling, enabling more reliable developer workflows in dynamic-language ecosystems.

Abstract

There is a vast gap in the quality of IDE tooling between static languages like Java and dynamic languages like Python or JavaScript. Modern frameworks and libraries in these languages heavily use their dynamic capabilities to achieve the best ergonomics and readability. This has a side effect of making the current generation of IDEs blind to control flow and data flow, which often breaks navigation, autocompletion and refactoring. In this thesis we propose an algorithm that can bridge this gap between tooling for dynamic and static languages by statically analyzing dynamic metaprogramming and runtime reflection in programs. We use a technique called abstract interpretation to partially execute programs and extract information that is usually only available at runtime. Our algorithm has been implemented in a prototype analyzer that can analyze programs written in a subset of JavaScript.

Context-Sensitive Abstract Interpretation of Dynamic Languages

TL;DR

The thesis targets the tooling gap between static and dynamic languages by proposing a context-sensitive abstract interpretation framework for dynamic languages. It develops TinyScript, a minimal JavaScript subset, and its closure-converted intermediate form to enable precise static analysis of dynamic features such as metaprogramming and runtime reflection. The core contribution is a context-sensitive analysis with heap specialization that yields precise control/data-flow information, along with a practical prototype implemented in Rust. This work lays the groundwork for improving IDE features like navigation, autocompletion, and refactoring in dynamic languages, with a path toward scaling to real-world JavaScript and Python codebases through incremental, on-demand analysis. The approach demonstrates that advanced static analyses can bridge the gap between dynamism and tooling, enabling more reliable developer workflows in dynamic-language ecosystems.

Abstract

There is a vast gap in the quality of IDE tooling between static languages like Java and dynamic languages like Python or JavaScript. Modern frameworks and libraries in these languages heavily use their dynamic capabilities to achieve the best ergonomics and readability. This has a side effect of making the current generation of IDEs blind to control flow and data flow, which often breaks navigation, autocompletion and refactoring. In this thesis we propose an algorithm that can bridge this gap between tooling for dynamic and static languages by statically analyzing dynamic metaprogramming and runtime reflection in programs. We use a technique called abstract interpretation to partially execute programs and extract information that is usually only available at runtime. Our algorithm has been implemented in a prototype analyzer that can analyze programs written in a subset of JavaScript.
Paper Structure (25 sections, 60 equations, 5 figures)

This paper contains 25 sections, 60 equations, 5 figures.

Figures (5)

  • Figure 1: A program that demonstrates separating data from different calls to one function. It is also an example of using pointwise operations on our lattice to combine context-specific and context-independent data. Rectangles contain abstract values of expressions, rounded rectangles denote operations and arrows indicate direction of computation.
  • Figure 2: Context-sensitive analysis of imperative code. Enter_ctx and exit_ctx are lifted to operate on abstract states. Compositional nature of these functions transparently enables context-sensitivity for the entire abstract heap. Mutable manipulation of heap-allocated objects preserves contexts in our value lattice. Addresses on the abstract heap are denoted by ${\texttt{\#}}_{1}$ and ${\texttt{\#}}_{2}$.
  • Figure 3: An example of why strong updates cannot be used with allocation-site abstraction. A single abstract object with address ${\texttt{\#}}_{1}$ represents both a and b. When b is initialized, the field x of both a and b is overwritten, resulting in an incorrect value of expression a.x on the next line.
  • Figure 4: Weak updates allow us to retain soundness when using allocation-site abstraction. Assigning $\{1, 2\}$ to a.x is a sound but imprecise solution.
  • Figure 5: An example of heap specialization. Inside the scope of create there is a single abstract object with address ${\texttt{\#}}_{1}$. However, upon exiting to main, it is specialized as ${\texttt{\#}}_{1/\alpha}$ or ${\texttt{\#}}_{1/\beta}$, depending on the call site. This enables us to distinguish two objects that allocation-site abstraction would normally mix up, and additionally we can perform strong updates on them. Auxiliary functions enter_call and exit_call are defined later in a section about abstract states. They are similar to enter_ctx and exit_ctx, but additionally handle heap specialization.