Composing Copyless Streaming String Transducers

Rajeev Alur; Taylor Dohmen; Ashutosh Trivedi

Composing Copyless Streaming String Transducers

Rajeev Alur, Taylor Dohmen, Ashutosh Trivedi

TL;DR

This paper analyzes the sequential composition of copyless streaming string transducers (SSTs) and shows that naive composition can yield copyful behavior. It introduces the diamond-free subclass, proving that the composed transducer is diamond-free and that all copyful behavior is superficial with respect to the final output length. A detailed, higher-order, parametric construction is provided for composing deterministic and nondeterministic SSTs, including state/shape/assignment summaries and a synchronization mechanism for nondeterminism. The authors also show how to convert any diamond-free NSST into an equivalent copyless NSST, yielding a complete approach to composing copyless SSTs directly. The results solidify the foundations for SST composition and connect to MSOTs via the copyless/equivalence framework.

Abstract

Streaming string transducers (SSTs) implement string-to-string transformations by reading each input word in a single left-to-right pass while maintaining fragments of potential outputs in a finite set of string variables. These variables get updated on transitions of the transducer, where they can be assigned new values described by concatenations of variables and output symbols. An SST is called copyless if every update is such that no variable occurs more than once amongst all of the assigned expressions. The transformations realized by copyless SSTs coincide with Courcelle's monadic second-order logic graph transducers (MSOTs) when restricted to string graphs. Copyless SSTs with nondeterminism are known to be equivalent to nondeterministic MSOTs as well. MSOTs, both deterministic and nondeterministic, are closed under composition. Given the equivalence of MSOTs and copyless SSTs, it is easy to see that copyless SSTs are also closed under composition. The original proof of this fact, however, was based on a direct construction to produce a composite copyless SST from two given copyless SSTs. A counterexample discovered by Joost Englefriet showed that this construction may produce copyful transducers. We revisit the original composition constructions for both deterministic and nondeterministic SSTs and show that, although they can introduce copyful updates, the resulting copyful behavior they exhibit is superficial. To characterize this mild copyful behavior, we define a subclass of copyful SSTs, called diamond-free SSTs, in which two copies of a common variable are never combined in any subsequent assignment. In order to recover a modified version of the original construction, we provide a method for producing an equivalent copyless SST from any diamond-free copyful SST.

Composing Copyless Streaming String Transducers

TL;DR

Abstract

Paper Structure (19 sections, 19 theorems, 26 equations, 10 figures, 1 algorithm)

This paper contains 19 sections, 19 theorems, 26 equations, 10 figures, 1 algorithm.

Introduction
Contributions and Outline.
Related Work.
Streaming String Transducers
Copying Variables
Flow Graphs and Diamonds
Diamond-Freeness.
Composition Construction
DSST Composition
State Transition Summarizer.
Assignment Summarizer.
Shape Generator.
Assignment Generator.
NSST Composition
Correctness and Size Bounds
...and 4 more sections

Key Result

Proposition 2.3

If $\left| X \right| = n$ and $n \geq 2$, then $\left| \bm{\left[} X \bm{\right]} \right| = \lfloor*\rfloor{e n!}$.

Figures (10)

Figure 1: Representative examples of copyless and copyful SSTs. In \ref{['subfig:copyless']}, $\sigma$ denotes an arbitrary symbol of an arbitrary alphabet.
Figure 2: Two copyless SSTs $T_1$, $T_2$ and a copyless SST $T_3$ implementing their composition. An edge labeled by $a \backslash \alpha$ indicates that the machine transitions from the edge's source state to its target state upon reading symbol $a$ and applies the assignment $\alpha$ to its set of variables.
Figure 3: A pair of two-step flow graphs from $T$ and $T_3$. Four diamonds are present in \ref{['subfig:diamond']}. Zero diamonds are present in \ref{['subfig:no_diamond']}.
Figure 4: Schematic of this paper's approach to composing copyless SSTs.
Figure 5: An illustration of the relationship between an assignment summary $h = \mathcal{G}\left( p, \alpha(x), f, g \right)$ and the corresponding shape $g^p_x = \mathcal{S}^p_x(h)$ and assignment $\gamma^p_x = \mathcal{A}^p_x(h)$.
...and 5 more figures

Theorems & Definitions (46)

Definition 2.1: SST Syntax
Definition 2.2: SST Semantics
Proposition 2.3
proof
Definition 2.4: Flow Graph
Definition 2.5: Diamond
Definition 3.1: DSST Composition
Definition 3.2: NSST Composition
Remark 3.3
Lemma 3.4
...and 36 more

Composing Copyless Streaming String Transducers

TL;DR

Abstract

Composing Copyless Streaming String Transducers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (46)