Table of Contents
Fetching ...

Advancing Hearing Assessment: An ASR-Based Frequency-Specific Speech Test for Diagnosing Presbycusis

Stefan Bleeck

TL;DR

This work tackles the gap between traditional audiometry and real-world speech understanding in presbycusis by introducing an Automatic Speech Recognition (ASR)–based frequency-specific speech test. The approach simulates hearing loss by acoustically degrading speech and analyzes phoneme-level confusions to generate a detailed impairment profile, enabling granular mapping of frequency-specific deficits. Through a two-phase item curation and diagnostic simulation, the method demonstrates discriminative power between normal-hearing and hearing-impaired listeners in a controlled setting and curates a 200-item test battery focused on high-frequency cues. The framework offers a scalable, objective avenue for deeper speech perception diagnostics and sets the stage for human validation and integration with advanced AI models to enhance clinical precision and efficiency.

Abstract

Traditional audiometry often fails to fully characterize the functional impact of hearing loss on speech understanding, particularly supra-threshold deficits and frequency-specific perception challenges in conditions like presbycusis. This paper presents the development and simulated evaluation of a novel Automatic Speech Recognition (ASR)-based frequency-specific speech test designed to provide granular diagnostic insights. Our approach leverages ASR to simulate the perceptual effects of moderate sloping hearing loss by processing speech stimuli under controlled acoustic degradation and subsequently analyzing phoneme-level confusion patterns. Key findings indicate that simulated hearing loss introduces specific phoneme confusions, predominantly affecting high-frequency consonants (e.g., alveolar/palatal to labiodental substitutions) and leading to significant phoneme deletions, consistent with the acoustic cues degraded in presbycusis. A test battery curated from these ASR-derived confusions demonstrated diagnostic value, effectively differentiating between simulated normal-hearing and hearing-impaired listeners in a comprehensive simulation. This ASR-driven methodology offers a promising avenue for developing objective, granular, and frequency-specific hearing assessment tools that complement traditional audiometry. Future work will focus on validating these findings with human participants and exploring the integration of advanced AI models for enhanced diagnostic precision.

Advancing Hearing Assessment: An ASR-Based Frequency-Specific Speech Test for Diagnosing Presbycusis

TL;DR

This work tackles the gap between traditional audiometry and real-world speech understanding in presbycusis by introducing an Automatic Speech Recognition (ASR)–based frequency-specific speech test. The approach simulates hearing loss by acoustically degrading speech and analyzes phoneme-level confusions to generate a detailed impairment profile, enabling granular mapping of frequency-specific deficits. Through a two-phase item curation and diagnostic simulation, the method demonstrates discriminative power between normal-hearing and hearing-impaired listeners in a controlled setting and curates a 200-item test battery focused on high-frequency cues. The framework offers a scalable, objective avenue for deeper speech perception diagnostics and sets the stage for human validation and integration with advanced AI models to enhance clinical precision and efficiency.

Abstract

Traditional audiometry often fails to fully characterize the functional impact of hearing loss on speech understanding, particularly supra-threshold deficits and frequency-specific perception challenges in conditions like presbycusis. This paper presents the development and simulated evaluation of a novel Automatic Speech Recognition (ASR)-based frequency-specific speech test designed to provide granular diagnostic insights. Our approach leverages ASR to simulate the perceptual effects of moderate sloping hearing loss by processing speech stimuli under controlled acoustic degradation and subsequently analyzing phoneme-level confusion patterns. Key findings indicate that simulated hearing loss introduces specific phoneme confusions, predominantly affecting high-frequency consonants (e.g., alveolar/palatal to labiodental substitutions) and leading to significant phoneme deletions, consistent with the acoustic cues degraded in presbycusis. A test battery curated from these ASR-derived confusions demonstrated diagnostic value, effectively differentiating between simulated normal-hearing and hearing-impaired listeners in a comprehensive simulation. This ASR-driven methodology offers a promising avenue for developing objective, granular, and frequency-specific hearing assessment tools that complement traditional audiometry. Future work will focus on validating these findings with human participants and exploring the integration of advanced AI models for enhanced diagnostic precision.

Paper Structure

This paper contains 13 sections, 1 equation, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Distribution of Error Types in Curated Test Items. This bar chart illustrates the distribution of error types (Substitution, Deletion, Insertion) in the final curated test item set (N=200). Blue bars represent the actual number of items selected for each error type, while dashed red lines indicate target counts derived from the overall error type percentages observed in the comprehensive ASR confusion dataset (Substitution: 52.7%, Deletion: 34.9%, Insertion: 12.4%). This figure demonstrates the effectiveness of the two-phase curation strategy in achieving a representative balance of error mechanisms.
  • Figure 2: Top N Selected Phoneme Confusion Types. This bar chart presents the top N (e.g., N=20) most frequently selected specific phoneme confusion types from the curated test item set. Each bar represents a unique confusion key, formatted as 'ErrorType_CleanPhonemeInvolved_HLPhonemeInvolved' (e.g., 'Substitution_S_F' or 'Deletion_Z_'). The height of each bar indicates the number of times that specific confusion type is represented. This figure highlights the individual phoneme-level errors prioritized by the Phase 1 selection, such as common deletions of sibilants or specific vowel substitutions, which are particularly relevant to presbycusis.
  • Figure 3: Place of Articulation Confusion Matrix (Curated Items). This heatmap depicts the distribution of place of articulation confusions within the curated test item set, specifically for substitution errors. The rows represent the place of articulation of the phoneme in the Clean ASR output, and the columns represent the place of articulation of the confused phoneme in the HL ASR output. The color intensity within each cell indicates the normalized count of observed substitutions. This visualization confirms whether the curated items predominantly reflect confusions among phonemes with high-frequency acoustic cues (e.g., Alveolar/Palatal fricatives) and specific patterns of misidentification (e.g., shifts to Labiodental place), consistent with the effects of high-frequency hearing loss.
  • Figure 4: Distributions of Curated Item Characteristics (Syllable and Levenshtein Distances). This panel of histograms illustrates: (a) target word syllable count, (b) distractor word syllable count, (c) phoneme Levenshtein distance (target vs. distractor), and (d) word Levenshtein distance (target vs. distractor). These figures confirm that criteria like matching syllable counts and limiting word Levenshtein distance were successfully applied, ensuring perceptual and lexical similarity.
  • Figure 5: Frequency Relevance of Curated Test Items. This bar chart shows the distribution of curated items based on the "Frequency Relevance" of the CleanPhonemeInvolved (e.g., 'High', 'Mid-High', 'Mid', 'General'). This quantifies the extent to which the test battery targets speech sounds whose perception is critically dependent on mid-to-high frequency information, aligning with the diagnostic objectives for presbycusis.
  • ...and 5 more figures