Exact Hard Monotonic Attention for Character-Level Transduction
Shijie Wu, Ryan Cotterell
TL;DR
This work investigates whether monotonicity is a beneficial inductive bias for character-level transduction by introducing an exact hard-monotonic attention framework. The authors extend prior hard attention with monotone alignment constraints and neural parameterization, enabling cubic-time marginalization over alignments and greedy decoding for inference. They demonstrate state-of-the-art single-model performance on morphological inflection and strong results on grapheme-to-phoneme conversion and named-entity transliteration, arguing that jointly learned monotone alignments are advantageous. The approach highlights the interpretability of alignment distributions and the practical viability of monotone transducers, albeit with some computational overhead relative to non-monotonic baselines. Code is released to facilitate reuse and further research.
Abstract
Many common character-level, string-to string transduction tasks, e.g., grapheme-tophoneme conversion and morphological inflection, consist almost exclusively of monotonic transductions. However, neural sequence-to sequence models that use non-monotonic soft attention often outperform popular monotonic models. In this work, we ask the following question: Is monotonicity really a helpful inductive bias for these tasks? We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce. With the help of dynamic programming, we are able to compute the exact marginalization over all monotonic alignments. Our models achieve state-of-the-art performance on morphological inflection. Furthermore, we find strong performance on two other character-level transduction tasks. Code is available at https://github.com/shijie-wu/neural-transducer.
