Table of Contents
Fetching ...

Sustained Vowels for Pre- vs Post-Treatment COPD Classification

Andreas Triantafyllopoulos, Anton Batliner, Wolfgang Mayr, Markus Fendler, Florian Pokorny, Maurice Gerczuk, Shahin Amiriparian, Thomas Berghaus, Björn Schuller

TL;DR

This study investigates whether sustained vowels can augment automatic COPD monitoring by distinguishing pre- vs post-treatment states. Using a cohort of 50 COPD patients producing five cardinal vowels and a German read passage, the authors extract 88 acoustic features (eGeMAPS) across whole vowels and segmented onset/centre/offset portions, with both mismatched and matched duration strategies. A nested leave-one-speaker-out SVM with speaker normalisation demonstrates that sustained vowels add complementary information to read speech, achieving a best $UAR = 79\%$ via late fusion of vowel, MPT, and phrase-based predictions, highlighting the role of voice quality, loudness dynamics, and formant dispersion in COPD manifestations. The work provides interpretable feature insights and supports integrating multiple vocal modalities for noninvasive COPD assessment, while noting the limitation of a relatively small dataset and arguing for broader multimodal future work.

Abstract

Chronic obstructive pulmonary disease (COPD) is a serious inflammatory lung disease affecting millions of people around the world. Due to an obstructed airflow from the lungs, it also becomes manifest in patients' vocal behaviour. Of particular importance is the detection of an exacerbation episode, which marks an acute phase and often requires hospitalisation and treatment. Previous work has shown that it is possible to distinguish between a pre- and a post-treatment state using automatic analysis of read speech. In this contribution, we examine whether sustained vowels can provide a complementary lens for telling apart these two states. Using a cohort of 50 patients, we show that the inclusion of sustained vowels can improve performance to up to 79\% unweighted average recall, from a 71\% baseline using read speech. We further identify and interpret the most important acoustic features that characterise the manifestation of COPD in sustained vowels.

Sustained Vowels for Pre- vs Post-Treatment COPD Classification

TL;DR

This study investigates whether sustained vowels can augment automatic COPD monitoring by distinguishing pre- vs post-treatment states. Using a cohort of 50 COPD patients producing five cardinal vowels and a German read passage, the authors extract 88 acoustic features (eGeMAPS) across whole vowels and segmented onset/centre/offset portions, with both mismatched and matched duration strategies. A nested leave-one-speaker-out SVM with speaker normalisation demonstrates that sustained vowels add complementary information to read speech, achieving a best via late fusion of vowel, MPT, and phrase-based predictions, highlighting the role of voice quality, loudness dynamics, and formant dispersion in COPD manifestations. The work provides interpretable feature insights and supports integrating multiple vocal modalities for noninvasive COPD assessment, while noting the limitation of a relatively small dataset and arguing for broader multimodal future work.

Abstract

Chronic obstructive pulmonary disease (COPD) is a serious inflammatory lung disease affecting millions of people around the world. Due to an obstructed airflow from the lungs, it also becomes manifest in patients' vocal behaviour. Of particular importance is the detection of an exacerbation episode, which marks an acute phase and often requires hospitalisation and treatment. Previous work has shown that it is possible to distinguish between a pre- and a post-treatment state using automatic analysis of read speech. In this contribution, we examine whether sustained vowels can provide a complementary lens for telling apart these two states. Using a cohort of 50 patients, we show that the inclusion of sustained vowels can improve performance to up to 79\% unweighted average recall, from a 71\% baseline using read speech. We further identify and interpret the most important acoustic features that characterise the manifestation of COPD in sustained vowels.
Paper Structure (7 sections, 2 figures, 1 table)

This paper contains 7 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Comparing $SP_{UAR}$ for vowels vs connected speech. $SP_{UAR}$ for connected speech computes UAR over all phrases of each speaker Triantafyllopoulos22-DBP. $SP_{UAR}$ for vowels computes UAR over all 5 vowels of each speaker; using 3-sec window at onset. Spearman's $\rho$ for the two models: $.2$ (p-value: $.16$).
  • Figure 2: Top-5 features identified by training them individually using the entire segments of all vowels and speaker normalisation, extracted pre- (solid; left) and post-treatment (dashed; right). Showing UAR [%] and p-values from two-sided Mann-Whitney U test. $\mu(dim(V))$: mean voiced segment length [in seconds]; $|V|/S$: number of voiced segments per second; $CV(F2_b)$: coefficient of variation of F2 bandwidth; $\mu(F)$: mean spectral flux; $\mu(S)$: mean shimmer; $\mu(LRS)$: mean loudness rising slope; $\mu(J)$: mean jitter; $CV(HNR)$: coefficient of variation of harmonic-to-noise ratio; $\mu(F_v)$: mean spectral flux in voiced regions; $\mu(AR_{uv})$: mean alpha ratio in unvoiced regions. Features that appear multiple times are coloured.