Generalized Parikh Matrices For Tracking Subsequence Occurrences
Szilárd Zsolt Fazekas, Xinhao Huang
TL;DR
The paper addresses the limitation of classical Parikh matrices in capturing occurrences of generalized subsequences and introduces a sharpened formalism based on generalized Parikh matrices. It defines the Parikh factor matrix mapping $Φ_{M_σ}$ and the Parikh sequence matrix mapping $Ξ_Q$, both as morphisms from words to matrix monoids, enabling composition under concatenation: $Φ_{M_σ}(w_1 w_2)=Φ_{M_σ}(w_1)Φ_{M_σ}(w_2)$ and similarly for $Ξ_Q$. By connecting these generalized matrices to the original Parikh matrices, the authors prove a nonnegativity property for certain minors of $Ξ_Q(w)$ and extend subword-history theory to generalized subsequences, including a linearization result. The work also discusses the ambiguity problem under extended Parikh matrices and suggests future directions, such as gap-constrained subsequences and potential regular-gap generalizations, with broad relevance to formal language theory.
Abstract
We introduce and study a generalized Parikh matrix mapping based on tracking the occurrence counts of special types of subsequences. These matrices retain more information about a word than the original Parikh matrix mapping while preserving the homomorphic property. We build the generalization by first introducing the Parikh factor matrix mapping and extend it to the Parikh sequence matrix mapping. We establish an interesting connection between the generalized Parikh matrices and the original ones and use it to prove that certain important minors of a Parikh sequence matrix have nonnegative determinant. Finally, we generalize the concept of subword histories and show that each generalized subword history is equivalent to a linear one.
