Composing Copyless Streaming String Transducers
Rajeev Alur, Taylor Dohmen, Ashutosh Trivedi
TL;DR
This paper analyzes the sequential composition of copyless streaming string transducers (SSTs) and shows that naive composition can yield copyful behavior. It introduces the diamond-free subclass, proving that the composed transducer is diamond-free and that all copyful behavior is superficial with respect to the final output length. A detailed, higher-order, parametric construction is provided for composing deterministic and nondeterministic SSTs, including state/shape/assignment summaries and a synchronization mechanism for nondeterminism. The authors also show how to convert any diamond-free NSST into an equivalent copyless NSST, yielding a complete approach to composing copyless SSTs directly. The results solidify the foundations for SST composition and connect to MSOTs via the copyless/equivalence framework.
Abstract
Streaming string transducers (SSTs) implement string-to-string transformations by reading each input word in a single left-to-right pass while maintaining fragments of potential outputs in a finite set of string variables. These variables get updated on transitions of the transducer, where they can be assigned new values described by concatenations of variables and output symbols. An SST is called copyless if every update is such that no variable occurs more than once amongst all of the assigned expressions. The transformations realized by copyless SSTs coincide with Courcelle's monadic second-order logic graph transducers (MSOTs) when restricted to string graphs. Copyless SSTs with nondeterminism are known to be equivalent to nondeterministic MSOTs as well. MSOTs, both deterministic and nondeterministic, are closed under composition. Given the equivalence of MSOTs and copyless SSTs, it is easy to see that copyless SSTs are also closed under composition. The original proof of this fact, however, was based on a direct construction to produce a composite copyless SST from two given copyless SSTs. A counterexample discovered by Joost Englefriet showed that this construction may produce copyful transducers. We revisit the original composition constructions for both deterministic and nondeterministic SSTs and show that, although they can introduce copyful updates, the resulting copyful behavior they exhibit is superficial. To characterize this mild copyful behavior, we define a subclass of copyful SSTs, called diamond-free SSTs, in which two copies of a common variable are never combined in any subsequent assignment. In order to recover a modified version of the original construction, we provide a method for producing an equivalent copyless SST from any diamond-free copyful SST.
