Table of Contents
Fetching ...

Discovering dynamical laws for speech gestures

Sam Kirkham

TL;DR

The study demonstrates a data-driven approach to uncover dynamical laws governing articulatory gestures in speech by applying sparse symbolic regression (SINDy) to XRMB kinematic data. It finds that a second-order linear oscillator model with minimal damping describes the majority of gestures well, while roughly one-third exhibit nonlinearity that is best captured with a cubic term, indicating autonomous nonlinear dynamics during gesture activation. The work validates SINDy on simulated data and extends it to real articulatory trajectories, comparing first- and second-order formulations, and using phase- and Hooke-portrait analyses to interpret the dynamics beyond fit quality. The findings support a dynamical systems view of speech production, offer interpretable dynamical laws, and highlight opportunities and challenges for data-driven discovery in cognitive science, including applications to other motor and language systems. Overall, the paper advances the prospect that language-relevant dynamical principles can be learned directly from data as autonomous, interpretable equations with clear implications for theory and methodology.

Abstract

A fundamental challenge in the cognitive sciences is discovering the dynamics that govern behaviour. Take the example of spoken language, which is characterised by a highly variable and complex set of physical movements that map onto the small set of cognitive units that comprise language. What are the fundamental dynamical principles behind the movements that structure speech production? In this study, we discover models in the form of symbolic equations that govern articulatory gestures during speech. A sparse symbolic regression algorithm is used to discover models from kinematic data on the tongue and lips. We explore these candidate models using analytical techniques and numerical simulations, and find that a second-order linear model achieves high levels of accuracy, but a nonlinear force is required to properly model articulatory dynamics in approximately one third of cases. This supports the proposal that an autonomous, nonlinear, second-order differential equation is a viable dynamical law for articulatory gestures in speech. We conclude by identifying future opportunities and obstacles in data-driven model discovery and outline prospects for discovering the dynamical principles that govern language, brain and behaviour.

Discovering dynamical laws for speech gestures

TL;DR

The study demonstrates a data-driven approach to uncover dynamical laws governing articulatory gestures in speech by applying sparse symbolic regression (SINDy) to XRMB kinematic data. It finds that a second-order linear oscillator model with minimal damping describes the majority of gestures well, while roughly one-third exhibit nonlinearity that is best captured with a cubic term, indicating autonomous nonlinear dynamics during gesture activation. The work validates SINDy on simulated data and extends it to real articulatory trajectories, comparing first- and second-order formulations, and using phase- and Hooke-portrait analyses to interpret the dynamics beyond fit quality. The findings support a dynamical systems view of speech production, offer interpretable dynamical laws, and highlight opportunities and challenges for data-driven discovery in cognitive science, including applications to other motor and language systems. Overall, the paper advances the prospect that language-relevant dynamical principles can be learned directly from data as autonomous, interpretable equations with clear implications for theory and methodology.

Abstract

A fundamental challenge in the cognitive sciences is discovering the dynamics that govern behaviour. Take the example of spoken language, which is characterised by a highly variable and complex set of physical movements that map onto the small set of cognitive units that comprise language. What are the fundamental dynamical principles behind the movements that structure speech production? In this study, we discover models in the form of symbolic equations that govern articulatory gestures during speech. A sparse symbolic regression algorithm is used to discover models from kinematic data on the tongue and lips. We explore these candidate models using analytical techniques and numerical simulations, and find that a second-order linear model achieves high levels of accuracy, but a nonlinear force is required to properly model articulatory dynamics in approximately one third of cases. This supports the proposal that an autonomous, nonlinear, second-order differential equation is a viable dynamical law for articulatory gestures in speech. We conclude by identifying future opportunities and obstacles in data-driven model discovery and outline prospects for discovering the dynamical principles that govern language, brain and behaviour.

Paper Structure

This paper contains 39 sections, 29 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: The damped-mass spring model is a model of vocal tract dynamics. The left diagram shows a midsagittal view of the vocal tract, with a box centred on the Tongue Tip task space. The middle diagram shows a physical damped mass-spring system representing the forces that act on the Tongue Tip gesture, where $m$ is a mass, $k$ is spring stiffness, and $b$ is the strength of the damping force. The right diagram shows simulated trajectories from the damped mass-spring model, with the Tongue Tip moving from a low to a high position (arbitrary units).
  • Figure 2: Position and absolute velocity trajectories simulated using a linear damped mass-spring model and a nonlinear (cubic) damped mass-spring model. In both cases, $x_{0} = 1, \dot{x}_{0} = 0, T = 0, k = 2000, b = 2\sqrt k$. The nonlinear cubic coefficient is $d = 0.95k$.
  • Figure 3: A Pareto curve showing the schematized relationship between model accuracy and model complexity, after brunton-kutz2022. An ideal model occupies the Pareto optimal space, which strikes a balance between accuracy and simplicity.
  • Figure 4: Simulated position and velocity trajectories plotted against SINDy model predictions.
  • Figure 5: 10 randomly sampled trajectories for each articulatory variable showing the fit between data and model predictions for first-order models with a third-degree polynomial library on the test data.
  • ...and 9 more figures