Exponent-Strings and Their Edit Distance
Ingyu Baek
TL;DR
The paper introduces exponent-strings that extend traditional strings by allowing real-number exponents, enabling explicit representation of continuous attributes like duration in phonetic transcription. It formalizes $S$-exponent-strings, extends to $\mathbb{R}^+$-exponent-strings, and defines exp-edit distance as a continuous generalization of string edit distance, backed by a robust mathematical framework and a corresponding algorithm for the $\mathbb{Q}^+$ case. A central result is that a minimum-cost exp-edit sequence exists (Theorem inf-min), and that when restricted to $\mathbb{N}$-exponent-strings, exp-edit distance coincides with the classical string edit distance (Corollary), tying the new theory to the traditional discrete setting. The algorithmic development includes a contraction-based reduction, a translation to standard string edit distance, and a complexity analysis, with practical implications for tasks like phonetic similarity assessment and paraphasia detection where precise duration information matters.
Abstract
An exponent-string is an extension of traditional strings that can incorporate real-numbered exponents, indicating the quantity of characters. This novel representation overcomes the limitations of traditional discrete string by enabling precise data representation for applications such as phonetic transcription that contains sound duration. Although applications of exponent-string are focused on exponent-string with real-numbered exponents, formal definition uses arbitrary semigroup. For any semigroup $S$, $S$-exponent-strings are allowed to have elements of $S$ as exponents. We investigate algebraic properties of $S$-exponent-strings and further justify $\mathbb{R}^+$-exponent-string is a natural extension of the string. Motivated by the problem of calculating the similarity between spoken phone sequence and correct phone sequence, we develop exp-edit distance -- a specialized metric designed to measure the similarity between $\mathbb{R}^+$-exponent-strings. By extending the traditional string edit distance to handle continuous values, exp-edit distance deals with $\mathbb{R}^+$-exponent-strings that embody both discrete and continuous properties. Our exploration includes a rigorous mathematical formulation of exp-edit distance and an algorithm to compute it.
