Table of Contents
Fetching ...

WEX: Formal Specifications for Windows in Stream Processing

S Hitarth, M. Praveen

TL;DR

Windowing in streaming data is addressed by formalizing Window EXpressions (WEX), a MSO-based specification that can express complex windows beyond traditional time- and count-based operators. WEX has two equivalent representations via Symbolic MSO and Symbolic Regular Expressions, enabling automatic window extraction. The framework introduces $k$-symbolic lookback automata ($k$-SLA) and proves their equivalence to S-MSO, making online, $0$-delay windowing possible. It analyzes the decidability of overlapping windows, showing undecidable in general but decidable for finite alphabets or completion-property theories, and sketches a pane-based streaming processor implementation.

Abstract

A key operation in processing an unbounded data stream is windowing, which extracts finite portions of streams for further handling. The existing frameworks and query languages either require windows to be defined using ad hoc imperative languages or are limited to rudimentary constructs such as time- or count-based windows. We propose Window EXpression, a formal specification for precisely expressing windowing constructs based on monadic second-order logic. WEX can naturally express traditional windowing constructs such as sliding windows and tumbling windows, as well as more complex windows whose start and end indices are triggered based on the satisfaction of given logical conditions. After introducing a model of symbolic automata with lookbacks over an alphabet theory, we present another equivalent representation of WEX based on symbolic regular expressions. The precise semantics of windowing enable static analysis over WEX. In particular, we show that, in general, it is undecidable to check whether a WEX allows an unbounded number of overlapping windows. However, when the data stream is over a finite alphabet, or the alphabet theory has the so-called completion property, the problem becomes decidable.

WEX: Formal Specifications for Windows in Stream Processing

TL;DR

Windowing in streaming data is addressed by formalizing Window EXpressions (WEX), a MSO-based specification that can express complex windows beyond traditional time- and count-based operators. WEX has two equivalent representations via Symbolic MSO and Symbolic Regular Expressions, enabling automatic window extraction. The framework introduces -symbolic lookback automata (-SLA) and proves their equivalence to S-MSO, making online, -delay windowing possible. It analyzes the decidability of overlapping windows, showing undecidable in general but decidable for finite alphabets or completion-property theories, and sketches a pane-based streaming processor implementation.

Abstract

A key operation in processing an unbounded data stream is windowing, which extracts finite portions of streams for further handling. The existing frameworks and query languages either require windows to be defined using ad hoc imperative languages or are limited to rudimentary constructs such as time- or count-based windows. We propose Window EXpression, a formal specification for precisely expressing windowing constructs based on monadic second-order logic. WEX can naturally express traditional windowing constructs such as sliding windows and tumbling windows, as well as more complex windows whose start and end indices are triggered based on the satisfaction of given logical conditions. After introducing a model of symbolic automata with lookbacks over an alphabet theory, we present another equivalent representation of WEX based on symbolic regular expressions. The precise semantics of windowing enable static analysis over WEX. In particular, we show that, in general, it is undecidable to check whether a WEX allows an unbounded number of overlapping windows. However, when the data stream is over a finite alphabet, or the alphabet theory has the so-called completion property, the problem becomes decidable.
Paper Structure (1 section)

This paper contains 1 section.

Table of Contents

  1. Introduction