Table of Contents
Fetching ...

The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Pietro Negri, Manuel Sica, Rocco Zaccagnino, Rosalba Zizza

Abstract

The Lyndon array stores, at each position of a word, the length of the longest maximal Lyndon subword starting at that position, and plays an important role in combinatorics on words, for example in the construction of fundamental data structures such as the suffix array. In this paper, we introduce the Inverse Lyndon Array, the analogous structure for inverse Lyndon words, namely words that are lexicographically greater than all their proper suffixes. Unlike standard Lyndon words, inverse Lyndon words may have non-trivial borders, which introduces a genuine theoretical difficulty. We show that the inverse Lyndon array can be characterized in terms of the next greater suffix array together with a border-correction term, and prove that this correction coincides with a longest common extension (LCE) value. Building on this characterization, we adapt the nearest-suffix framework underlying Ellert's linear-time construction of the Lyndon array to the inverse setting, obtaining an O(n)-time algorithm for general ordered alphabets. Finally, we discuss implications for suffix comparison and report experiments on random, structured, and real datasets showing that the inverse construction exhibits the same practical linear-time behavior as the standard one.

The Inverse Lyndon Array: Definition, Properties, and Linear-Time Construction

Abstract

The Lyndon array stores, at each position of a word, the length of the longest maximal Lyndon subword starting at that position, and plays an important role in combinatorics on words, for example in the construction of fundamental data structures such as the suffix array. In this paper, we introduce the Inverse Lyndon Array, the analogous structure for inverse Lyndon words, namely words that are lexicographically greater than all their proper suffixes. Unlike standard Lyndon words, inverse Lyndon words may have non-trivial borders, which introduces a genuine theoretical difficulty. We show that the inverse Lyndon array can be characterized in terms of the next greater suffix array together with a border-correction term, and prove that this correction coincides with a longest common extension (LCE) value. Building on this characterization, we adapt the nearest-suffix framework underlying Ellert's linear-time construction of the Lyndon array to the inverse setting, obtaining an O(n)-time algorithm for general ordered alphabets. Finally, we discuss implications for suffix comparison and report experiments on random, structured, and real datasets showing that the inverse construction exhibits the same practical linear-time behavior as the standard one.
Paper Structure (13 sections, 17 theorems, 16 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 17 theorems, 16 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

theorem 1

$x[i..j]$ is maximal Lyndon subword if and only if:

Figures (2)

  • Figure 1: NSS and PSS edges for $x=\#banana\$$. Dashed arcs denote next smaller suffix edges, while solid arcs denote previous smaller suffix edges.
  • Figure 2: Border correction for the inverse Lyndon array. If the maximal inverse Lyndon word starting at position $i$ has the form $z=bhb$, then the next greater suffix starts at the second occurrence of $b$. Consequently, $\mathrm{lce}(i,j)=|b|$ and $\lambda^{-1}[i]=(j-i)+|b|$.

Theorems & Definitions (33)

  • definition 1: Lyndon Array
  • theorem 1: algorithms-lyndon-array-original
  • definition 2: NSS and PSS Arrays
  • lemma 1: lyndon-simple
  • lemma 2: lyndon-simple
  • lemma 3: Non-Crossing lyndon-simple
  • lemma 4: Chain Iteration lyndon-simple
  • lemma 5: LCE Acceleration lyndon-simple
  • definition 3: inverse-lyndon
  • lemma 6: inverse-lyndon
  • ...and 23 more