Online Computation of String Net Frequency
Peaker Guo, Seeun William Umboh, Anthony Wirth, Justin Zobel
TL;DR
This work tackles online computation of the net frequency $\phi(S)$ of a string $S$ in a text $T$, introducing the SINGLE-NF and ALL-NF problems for streaming texts. The authors develop suffix-tree–based methods, including a new NF characteristic and Weiner-link techniques, to achieve optimal-time solutions: $O(m)$ for online SINGLE-NF and $O(n)$ for online ALL-NF under a constant alphabet. They provide both offline foundations and online adaptations, building on Ukkonen's online suffix tree construction and results on implicit nodes to handle dynamic updates. The results demonstrate that online NF computation can match the offline efficiency, enabling fast querying and reporting for strings with positive NF in streaming texts, with potential practical impact in NLP and related string-processing tasks.
Abstract
The net frequency (NF) of a string, of length $m$, in a text, of length $n$, is the number of occurrences of the string in the text with unique left and right extensions. Recently, Guo et al. [CPM 2024] showed that NF is combinatorially interesting and how two key questions can be computed efficiently in the offline setting. First, SINGLE-NF: reporting the NF of a query string in an input text. Second, ALL-NF: reporting an occurrence and the NF of each string of positive NF in an input text. For many applications, however, facilitating these computations in an online manner is highly desirable. We are the first to solve the above two problems in the online setting, and we do so in optimal time, assuming, as is common, a constant-size alphabet: SINGLE-NF in $O(m)$ time and ALL-NF in $O(n)$ time. Our results are achieved by first designing new and simpler offline algorithms using suffix trees, proving additional properties of NF, and exploiting Ukkonen's online suffix tree construction algorithm and results on implicit node maintenance in an implicit suffix tree by Breslauer and Italiano.
