Table of Contents
Fetching ...

Abstract String Domain Defined with Word Equations as a Reduced Product (Extended Version)

Antonina Nepeivoda, Ilya Afanasyev

TL;DR

This work introduces a string-interval abstract domain built from word equations and word disequalities to bound string values, structured as a reduced product over a string-property lattice. Abstract string objects leverage length-non-increasing morphisms to capture value, length, and additional structural properties (prefixes, suffixes, factors, and forbidden factors), enabling JavaScript string analysis through a unified, lattice-based framework. The paper develops multiple reduction strategies—standalone, cross-reduction, and unary-bound refinements—to guarantee lattice stability while maintaining practical precision, and it demonstrates lightweight abstractions for common string operations such as concatenation, indexing, substring, and replacement. It also provides illustrative examples (split with guards, a looping sanitizer, and letter-preservation effects) and discusses dense-language considerations, complexity (noting NP-hardness for perfect reductions in general), and related work, offering a foundation for precise static analysis of string-manipulating programs and avenues for future work in dynamic languages and solver integration.

Abstract

We introduce a string-interval abstract domain, where string intervals are characterized by systems of word equations (encoding lower bounds on string values) and word disequalities (encoding upper bounds). Building upon the lattice structure of string intervals, we define an abstract string object as a reduced product on a string property semilattice, determined by length-non-increasing morphisms. We consider several reduction strategies for abstract string objects and show that upon these strategies the string object domain forms a lattice. We define basic abstract string operations on the domain, aiming to minimize computational overheads on the reduction, and show how the domain can be used to analyse properties of JavaScript string manipulating programs.

Abstract String Domain Defined with Word Equations as a Reduced Product (Extended Version)

TL;DR

This work introduces a string-interval abstract domain built from word equations and word disequalities to bound string values, structured as a reduced product over a string-property lattice. Abstract string objects leverage length-non-increasing morphisms to capture value, length, and additional structural properties (prefixes, suffixes, factors, and forbidden factors), enabling JavaScript string analysis through a unified, lattice-based framework. The paper develops multiple reduction strategies—standalone, cross-reduction, and unary-bound refinements—to guarantee lattice stability while maintaining practical precision, and it demonstrates lightweight abstractions for common string operations such as concatenation, indexing, substring, and replacement. It also provides illustrative examples (split with guards, a looping sanitizer, and letter-preservation effects) and discusses dense-language considerations, complexity (noting NP-hardness for perfect reductions in general), and related work, offering a foundation for precise static analysis of string-manipulating programs and avenues for future work in dynamic languages and solver integration.

Abstract

We introduce a string-interval abstract domain, where string intervals are characterized by systems of word equations (encoding lower bounds on string values) and word disequalities (encoding upper bounds). Building upon the lattice structure of string intervals, we define an abstract string object as a reduced product on a string property semilattice, determined by length-non-increasing morphisms. We consider several reduction strategies for abstract string objects and show that upon these strategies the string object domain forms a lattice. We define basic abstract string operations on the domain, aiming to minimize computational overheads on the reduction, and show how the domain can be used to analyse properties of JavaScript string manipulating programs.

Paper Structure

This paper contains 22 sections, 12 theorems, 14 equations, 9 figures, 1 algorithm.

Key Result

Proposition 5.1

Figures (9)

  • Figure 1: Types of basic string intervals
  • Figure 2: String interval wrt the subword ordering represented by the bounds $\omega\in\mathit{\mathbf{Sol}}_Z\{Z = X_1 a Y_1, Z = X_2 b Y_2\}$ and $\omega\in\mathit{\mathbf{Sol}}_Z\{Z\neq X_1 aa Y_1, Z\neq X_2 bb Y_2, Z\neq X_3 aba Y_3\}$ in $\Sigma=\{a,b\}$, and its length interval $[3,4)$. Nodes satifying the lower bound are filled with blue; nodes violating the upper bound condition are circled in red.
  • Figure 3: A string property as a Cartesian product
  • Figure 4: Morphisms ordering, words preserved by the morphisms (given in blue boxes), and equation propagation via reduction. Only the lower bounds are considered, and the length lower bound is given as a set of equations for uniformity.
  • Figure 5: Example of object operations with distinct morphisms semilattices
  • ...and 4 more figures

Theorems & Definitions (35)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Example 2.1
  • Definition 3.1
  • Example 3.1
  • Definition 4.1
  • Definition 5.1
  • Proposition 5.1
  • Example 5.1
  • ...and 25 more