A Rational Analysis of the Speech-to-Song Illusion
Raja Marjieh, Pol van Rijn, Ilia Sucholutsky, Harin Lee, Thomas L. Griffiths, Nori Jacoby
TL;DR
The paper tackles why repetition makes spoken phrases sound more song-like and why this transformation varies by phrase. It formulates a principled Bayesian framework that treats the illusion as a two-hypothesis inference problem, comparing $p(s^n|\text{song})$ and $p(s^n|\text{speech})$ using corpus-derived statistics and, crucially, tests the approach with text-only data. It demonstrates that a purely textual prose-to-lyrics illusion emerges when repetition shifts the log-odds in favor of song, a prediction supported by both human judgments and GPT-4 outputs. Together, these results provide a parsimonious, cross-modal computational account of perceptual illusions rooted in learned linguistic statistics and suggest avenues for richer generative modeling across languages and modalities.
Abstract
The speech-to-song illusion is a robust psychological phenomenon whereby a spoken sentence sounds increasingly more musical as it is repeated. Despite decades of research, a complete formal account of this transformation is still lacking, and some of its nuanced characteristics, namely, that certain phrases appear to transform while others do not, is not well understood. Here we provide a formal account of this phenomenon, by recasting it as a statistical inference whereby a rational agent attempts to decide whether a sequence of utterances is more likely to have been produced in a song or speech. Using this approach and analyzing song and speech corpora, we further introduce a novel prose-to-lyrics illusion that is purely text-based. In this illusion, simply duplicating written sentences makes them appear more like song lyrics. We provide robust evidence for this new illusion in both human participants and large language models.
