The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Pietro Negri; Manuel Sica; Rocco Zaccagnino; Rosalba Zizza

The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Pietro Negri, Manuel Sica, Rocco Zaccagnino, Rosalba Zizza

Abstract

The Lyndon array stores, at each position of a word, the length of the longest maximal Lyndon subword starting at that position, and plays an important role in combinatorics on words, for example in the construction of fundamental data structures such as the suffix array. In this paper, we introduce the Inverse Lyndon Array, the analogous structure for inverse Lyndon words, namely words that are lexicographically greater than all their proper suffixes. Unlike standard Lyndon words, inverse Lyndon words may have non-trivial borders, which introduces a genuine theoretical difficulty. We show that the inverse Lyndon array can be characterized in terms of the next greater suffix array together with a border-correction term, and prove that this correction coincides with a longest common extension (LCE) value. Building on this characterization, we adapt the nearest-suffix framework underlying Ellert's linear-time construction of the Lyndon array to the inverse setting, obtaining an O(n)-time algorithm for general ordered alphabets. Finally, we discuss implications for suffix comparison and report experiments on random, structured, and real datasets showing that the inverse construction exhibits the same practical linear-time behavior as the standard one.

The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Abstract

Paper Structure (13 sections, 17 theorems, 16 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 17 theorems, 16 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Contributions.
Preliminaries
Words and lexicographic orders.
The NSS/PSS Framework
The Inverse Lyndon Array
Inverse Lyndon Words
NGS/PGS Arrays and their Relationship to $\lambda^{-1}$
Combinatorial Properties of NGS/PGS Edges
The LCE-NGS Algorithm
Experimental Evaluation
Conclusions and Suffix-Sorting Perspectives

Key Result

theorem 1

$x[i..j]$ is maximal Lyndon subword if and only if:

Figures (2)

Figure 1: NSS and PSS edges for $x=\#banana\$$. Dashed arcs denote next smaller suffix edges, while solid arcs denote previous smaller suffix edges.
Figure 2: Border correction for the inverse Lyndon array. If the maximal inverse Lyndon word starting at position $i$ has the form $z=bhb$, then the next greater suffix starts at the second occurrence of $b$. Consequently, $\mathrm{lce}(i,j)=|b|$ and $\lambda^{-1}[i]=(j-i)+|b|$.

Theorems & Definitions (33)

definition 1: Lyndon Array
theorem 1: algorithms-lyndon-array-original
definition 2: NSS and PSS Arrays
lemma 1: lyndon-simple
lemma 2: lyndon-simple
lemma 3: Non-Crossing lyndon-simple
lemma 4: Chain Iteration lyndon-simple
lemma 5: LCE Acceleration lyndon-simple
definition 3: inverse-lyndon
lemma 6: inverse-lyndon
...and 23 more

The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Abstract

The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (33)