Table of Contents
Fetching ...

Speech perception: a model of word recognition

Jean-Marc Luck, Anita Mehta

TL;DR

The paper addresses how correlations among sounds shape speech perception and word recognition, particularly under mishearing. It introduces a physics-inspired model where phoneme-like units are encoded as spins on an open chain and words correspond to fixed points of a descent dynamics; the lexicon becomes rich in short words consistent with typical word-length distributions. The authors distinguish short word decoding, which remains fast and may yield alternative words, from long word decoding, where mishearing can cause the process to wander and fail to converge, yielding a qualitative dynamical phase transition between disordered and ferromagnetic regimes controlled by a non-Hamiltonian long-range field. The work connects the model to observed Gamma-like word-length distributions and universal phoneme-to-sound ratios, offering a minimal dynamical framework for understanding word recognition and mishearing.

Abstract

We present a model of speech perception which takes into account effects of correlations between sounds. Words in this model correspond to the attractors of a suitably chosen descent dynamics. The resulting lexicon is rich in short words, and much less so in longer ones, as befits a reasonable word length distribution. We separately examine the decryption of short and long words in the presence of mishearings. In the regime of short words, the algorithm either quickly retrieves a word, or proposes another valid word. In the regime of longer words, the behaviour is markedly different. While the successful decryption of words continues to be relatively fast, there is a finite probability of getting lost permanently, as the algorithm wanders round the landscape of suitable words without ever settling on one.

Speech perception: a model of word recognition

TL;DR

The paper addresses how correlations among sounds shape speech perception and word recognition, particularly under mishearing. It introduces a physics-inspired model where phoneme-like units are encoded as spins on an open chain and words correspond to fixed points of a descent dynamics; the lexicon becomes rich in short words consistent with typical word-length distributions. The authors distinguish short word decoding, which remains fast and may yield alternative words, from long word decoding, where mishearing can cause the process to wander and fail to converge, yielding a qualitative dynamical phase transition between disordered and ferromagnetic regimes controlled by a non-Hamiltonian long-range field. The work connects the model to observed Gamma-like word-length distributions and universal phoneme-to-sound ratios, offering a minimal dynamical framework for understanding word recognition and mishearing.

Abstract

We present a model of speech perception which takes into account effects of correlations between sounds. Words in this model correspond to the attractors of a suitably chosen descent dynamics. The resulting lexicon is rich in short words, and much less so in longer ones, as befits a reasonable word length distribution. We separately examine the decryption of short and long words in the presence of mishearings. In the regime of short words, the algorithm either quickly retrieves a word, or proposes another valid word. In the regime of longer words, the behaviour is markedly different. While the successful decryption of words continues to be relatively fast, there is a finite probability of getting lost permanently, as the algorithm wanders round the landscape of suitable words without ever settling on one.

Paper Structure

This paper contains 13 sections, 86 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: Configurational entropy $S$ against width $w$ of the random field distribution. Symbol: exactly known value $S_0$ in the absence of random fields (see (\ref{['szero']})).
  • Figure 2: Mean time $\langle T_0\rangle$ taken by the descent dynamics (\ref{['dy']}) to select a word $W$ from a disordered initial configuration, plotted against $\ln L$ for $w=1$. Red: Random update. Blue: Ordered update. Full lines: least-square fits with respective slopes 1.34 and 0.63.
  • Figure 3: Mean number $\langle N\rangle$ of residual mishearings against density $p$ of initial mishearings, for $L=50$ and $w=1$. Red: Random update. Blue: Ordered update. Black dashed line: $\langle N_0\rangle$ (see (\ref{['nzero']})).
  • Figure 4: Mean running time $\langle T\rangle$ of the decryption process against density $p$ of mishearings. Same parameters and conventions as in Figure \ref{['ave']}.
  • Figure 5: Amplitudes $a$ (upper curves) and $b$ (lower curves) entering the scaling results (\ref{['npsca']}), plotted against the width $w$ of the local field distribution. Same parameters and conventions as in Figure \ref{['ave']}. Symbols: maximal values $a_0=1$ and $b_0=1/3$ in the absence of random fields.
  • ...and 14 more figures