Table of Contents
Fetching ...

The Dynamic Articulatory Model DYNARTmo: Dynamic Movement Generation and Speech Gestures

Bernd J. Kröger

TL;DR

The paper introduces DYNARTmo, a dynamic articulatory model that operationalizes a gesture-score framework to convert cognitive-phonological plans into continuous articulator trajectories. Gestures are defined by a temporal activation $a_g(t)$, a target vector $T_{g,P}$, and a pull weight $p_g$, yielding an instantaneous displacement $D_{g,P}(t) = a_g(t)\,T_{g,P}$ and a blended trajectory $P(t) = \frac{\sum_{g} p_g\,a_g(t)\,T_{g,P}}{\sum_{g} p_g\,a_g(t)}$. Gesture scores are proposed to be learned premotorly, though a full language-specific syllabary is still needed to reliably compute onset/offset times from phonological input, motivating integration with a mental syllabary. An accompanying web app and Python supplementary material enable visualization and reproduction of gesture sequences, facilitating language extension and cross-language exploration of coarticulation phenomena.

Abstract

This paper describes the current implementation of the dynamic articulatory model DYNARTmo, which generates continuous articulator movements based on the concept of speech gestures and a corresponding gesture score. The model provides a neurobiologically inspired computational framework for simulating the hierarchical control of speech production from linguistic representation to articulatory-acoustic realization. We present the structure of the gesture inventory, the coordination of gestures in the gesture score, and their translation into continuous articulator trajectories controlling the DYNARTmo vocal tract model.

The Dynamic Articulatory Model DYNARTmo: Dynamic Movement Generation and Speech Gestures

TL;DR

The paper introduces DYNARTmo, a dynamic articulatory model that operationalizes a gesture-score framework to convert cognitive-phonological plans into continuous articulator trajectories. Gestures are defined by a temporal activation , a target vector , and a pull weight , yielding an instantaneous displacement and a blended trajectory . Gesture scores are proposed to be learned premotorly, though a full language-specific syllabary is still needed to reliably compute onset/offset times from phonological input, motivating integration with a mental syllabary. An accompanying web app and Python supplementary material enable visualization and reproduction of gesture sequences, facilitating language extension and cross-language exploration of coarticulation phenomena.

Abstract

This paper describes the current implementation of the dynamic articulatory model DYNARTmo, which generates continuous articulator movements based on the concept of speech gestures and a corresponding gesture score. The model provides a neurobiologically inspired computational framework for simulating the hierarchical control of speech production from linguistic representation to articulatory-acoustic realization. We present the structure of the gesture inventory, the coordination of gestures in the gesture score, and their translation into continuous articulator trajectories controlling the DYNARTmo vocal tract model.

Paper Structure

This paper contains 13 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Cognitive-linguistic (phonological) representation of gesture score of the two-syllabic (nonsense) word “kamflik” /kam.flik/. Gestures are arranged in temporal order across four gesture tiers: vocalic, consonantal, velopharyngeal, and glottal gesture tier. Blue rectangles indicate duration of each gesture (gesture activation interval).
  • Figure 2: Phonetic-motoric representation of the gesture score and control parameter trajectories for main articulators for the same nonsense word “kamflik” as displayed in Figure \ref{['fig:gesture_score']}. Gesture activation levels (thin gray-black lines) and activation levels of neutral gestures (thin red lines), as well as control parameter trajectories for the main articulators (green-blue, red, and gray-black thick lines), are shown. Gestures and control parameter trajectories are arranged according to gesture tiers: vocalic, consonantal, velopharyngeal, glottal, and pulmonary gesture tier. Vocalic tier (white background): a- and i-shaping gesture: this tier indicates vocalic height as red, vocalic position (fronting) as green-blue, and lip rounding (spreading is indicated by negative values) as gray-black line; consonantal tier (light blue-green background): dorsal closing (= kgN), labial closing (= pbm), labio-dental near closing (= fv), apical lateral closing (= l), and again dorsal closing gesture; this tier indicates labial closing as green-blue line, apical closing as dark blue, and dorsal closing as red thick line; values on the consonantal tier are positive indicating degree of consonantal constriction; velopharyngeal tier (white background): velopharyngeal tight closing (= obstruent), closing (= oral), openig (= nasal) and so forth gestures; the tier indicates velopharyngeal opening or closing up to tight closing as thick green-blue line (opening as negative values indicating velum position and closure up to tight closure as positive values); glottal tier (light blue-green background): opening (= open), closing (= phonation) and so forth gestures; the tier indicates degree of glottal opening (glottal abduction) as thick red line; pulmonary tier (white background): one pulmonary gesture (green-blue thick line) for producing constant subglottal pressure during speaking.
  • Figure 3: Gesture activation functions (dashed lines) and resulting control parameter trajectories (solid lines) for three control parameters of our articulatory model (vocalic height, vocalic position or fronting, and labial closing) are given for /pa:i:/. Three gestures are shown: two vocalic gestures, i.e., an a-shaping followed by an i-shaping gesture and one consonantal gesture, i.e., a labial closing gesture.