Deciphering Assamese Vowel Harmony with Featural InfoWaveGAN
Sneha Ray Barman, Shakuntala Mahanta, Neeraj Kumar Sharma
TL;DR
The paper investigates learning Assamese vowel harmony directly from raw speech using Featural InfoWaveGAN (fiwGAN). It demonstrates that the model captures iterative long-distance, regressive harmony and even lexical learning, producing both harmonic and illicit outputs that reflect human-like acquisition patterns. Statistical analyses (linear mixed-effects and regression) provide evidence that a [+high,+ATR] vowel acts as a trigger, with the model exhibiting right-to-left harmony and meaningful generalization beyond the training data. The work highlights the potential and limits of unsupervised phonotactic learning from continuous speech and points to data augmentation and cross-language testing as avenues for future improvement.
Abstract
Traditional approaches for understanding phonological learning have predominantly relied on curated text data. Although insightful, such approaches limit the knowledge captured in textual representations of the spoken language. To overcome this limitation, we investigate the potential of the Featural InfoWaveGAN model to learn iterative long-distance vowel harmony using raw speech data. We focus on Assamese, a language known for its phonologically regressive and word-bound vowel harmony. We demonstrate that the model is adept at grasping the intricacies of Assamese phonotactics, particularly iterative long-distance harmony with regressive directionality. It also produced non-iterative illicit forms resembling speech errors during human language acquisition. Our statistical analysis reveals a preference for a specific [+high,+ATR] vowel as a trigger across novel items, indicative of feature learning. More data and control could improve model proficiency, contrasting the universality of learning.
