Speech perception: a model of word recognition
Jean-Marc Luck, Anita Mehta
TL;DR
The paper addresses how correlations among sounds shape speech perception and word recognition, particularly under mishearing. It introduces a physics-inspired model where phoneme-like units are encoded as spins on an open chain and words correspond to fixed points of a descent dynamics; the lexicon becomes rich in short words consistent with typical word-length distributions. The authors distinguish short word decoding, which remains fast and may yield alternative words, from long word decoding, where mishearing can cause the process to wander and fail to converge, yielding a qualitative dynamical phase transition between disordered and ferromagnetic regimes controlled by a non-Hamiltonian long-range field. The work connects the model to observed Gamma-like word-length distributions and universal phoneme-to-sound ratios, offering a minimal dynamical framework for understanding word recognition and mishearing.
Abstract
We present a model of speech perception which takes into account effects of correlations between sounds. Words in this model correspond to the attractors of a suitably chosen descent dynamics. The resulting lexicon is rich in short words, and much less so in longer ones, as befits a reasonable word length distribution. We separately examine the decryption of short and long words in the presence of mishearings. In the regime of short words, the algorithm either quickly retrieves a word, or proposes another valid word. In the regime of longer words, the behaviour is markedly different. While the successful decryption of words continues to be relatively fast, there is a finite probability of getting lost permanently, as the algorithm wanders round the landscape of suitable words without ever settling on one.
