Table of Contents
Fetching ...

Automated Measurement of Geniohyoid Muscle Thickness During Speech Using Deep Learning and Ultrasound

Alisher Myrgyyassov, Bruce Xiao Wang, Yu Sun, Shuming Huang, Zhen Song, Min Ney Wong, Yongping Zheng

TL;DR

SMMA achieves expert-validated accuracy while eliminating the need for manual annotation, enabling scalable investigations of speech motor control and objective assessment of speech and swallowing disorders.

Abstract

Manual measurement of muscle morphology from ultrasound during speech is time-consuming and limits large-scale studies. We present SMMA, a fully automated framework that combines deep-learning segmentation with skeleton-based thickness quantification to analyze geniohyoid (GH) muscle dynamics. Validation demonstrates near-human-level accuracy (Dice = 0.9037, MAE = 0.53 mm, r = 0.901). Application to Cantonese vowel production (N = 11) reveals systematic patterns: /a:/ shows significantly greater GH thickness (7.29 mm) than /i:/ (5.95 mm, p < 0.001, Cohen's d > 1.3), suggesting greater GH activation during production of /a:/ than /i:/, consistent with its role in mandibular depression. Sex differences (5-8% greater in males) reflect anatomical scaling. SMMA achieves expert-validated accuracy while eliminating the need for manual annotation, enabling scalable investigations of speech motor control and objective assessment of speech and swallowing disorders.

Automated Measurement of Geniohyoid Muscle Thickness During Speech Using Deep Learning and Ultrasound

TL;DR

SMMA achieves expert-validated accuracy while eliminating the need for manual annotation, enabling scalable investigations of speech motor control and objective assessment of speech and swallowing disorders.

Abstract

Manual measurement of muscle morphology from ultrasound during speech is time-consuming and limits large-scale studies. We present SMMA, a fully automated framework that combines deep-learning segmentation with skeleton-based thickness quantification to analyze geniohyoid (GH) muscle dynamics. Validation demonstrates near-human-level accuracy (Dice = 0.9037, MAE = 0.53 mm, r = 0.901). Application to Cantonese vowel production (N = 11) reveals systematic patterns: /a:/ shows significantly greater GH thickness (7.29 mm) than /i:/ (5.95 mm, p < 0.001, Cohen's d > 1.3), suggesting greater GH activation during production of /a:/ than /i:/, consistent with its role in mandibular depression. Sex differences (5-8% greater in males) reflect anatomical scaling. SMMA achieves expert-validated accuracy while eliminating the need for manual annotation, enabling scalable investigations of speech motor control and objective assessment of speech and swallowing disorders.
Paper Structure (16 sections, 2 equations, 3 figures, 2 tables)

This paper contains 16 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Visualization of the SMMA pipeline. Image (a) shows the original image and probe placement, (b) an automatically generated mask by UltraUNet, and (c) shows the middle 50% skeleton extracted from the mask and corresponding thickness measurements in pixels (px) and millimetres (mm).
  • Figure 2: Component 2 thickness measurement validation: SMMA automated measurements (left) vs. sonographer ground truth annotations (right).
  • Figure 3: Representative 30-second sample from the full recording of continuous muscle thickness tracking on a female subject during speech production. Subject produces repeated /a/ - /i/ - /u/ isolated vowel sequences (purple = /a/ , red = /i/ , green = /u/ , white = pause), each vowel lasting around 800 ms. Brief gaps in tracking occur when image quality temporarily degrades (e.g., 75s), demonstrating algorithm behavior under naturalistic recording conditions.