Table of Contents
Fetching ...

Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models

Matthew Perez, Aneesha Sampath, Minxue Niu, Emily Mower Provost

TL;DR

This paper tackles multiclass paraphasia detection in continuous aphasic speech, extending beyond prior binary or single-word approaches. It compares a GPT-based transcript classifier (using ASR or oracle transcripts) with end-to-end models—Single-Seq and Multi-Seq—on AphasiaBank data, evaluating with $WER$, $AWER$, $TD$, and utterance-level $F1$. Key findings show that a single-sequence end-to-end model yields superior multiclass paraphasia detection, while GPT with oracle transcripts provides an upper bound, especially for phonemic and neologistic paraphasias. The results highlight remaining challenges for semantic paraphasias and underscore the potential for automated, clinically useful paraphasia localization in continuous speech, with implications for remote therapy and assessment via accessible metrics like $TD$ and $AWER$.

Abstract

Aphasia is a language disorder that can lead to speech errors known as paraphasias, which involve the misuse, substitution, or invention of words. Automatic paraphasia detection can help those with Aphasia by facilitating clinical assessment and treatment planning options. However, most automatic paraphasia detection works have focused solely on binary detection, which involves recognizing only the presence or absence of a paraphasia. Multiclass paraphasia detection represents an unexplored area of research that focuses on identifying multiple types of paraphasias and where they occur in a given speech segment. We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts as well as two end-to-end approaches that focus on modeling both automatic speech recognition (ASR) and paraphasia classification as multiple sequences vs. a single sequence. We demonstrate that a single sequence model outperforms GPT baselines for multiclass paraphasia detection.

Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models

TL;DR

This paper tackles multiclass paraphasia detection in continuous aphasic speech, extending beyond prior binary or single-word approaches. It compares a GPT-based transcript classifier (using ASR or oracle transcripts) with end-to-end models—Single-Seq and Multi-Seq—on AphasiaBank data, evaluating with , , , and utterance-level . Key findings show that a single-sequence end-to-end model yields superior multiclass paraphasia detection, while GPT with oracle transcripts provides an upper bound, especially for phonemic and neologistic paraphasias. The results highlight remaining challenges for semantic paraphasias and underscore the potential for automated, clinically useful paraphasia localization in continuous speech, with implications for remote therapy and assessment via accessible metrics like and .

Abstract

Aphasia is a language disorder that can lead to speech errors known as paraphasias, which involve the misuse, substitution, or invention of words. Automatic paraphasia detection can help those with Aphasia by facilitating clinical assessment and treatment planning options. However, most automatic paraphasia detection works have focused solely on binary detection, which involves recognizing only the presence or absence of a paraphasia. Multiclass paraphasia detection represents an unexplored area of research that focuses on identifying multiple types of paraphasias and where they occur in a given speech segment. We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts as well as two end-to-end approaches that focus on modeling both automatic speech recognition (ASR) and paraphasia classification as multiple sequences vs. a single sequence. We demonstrate that a single sequence model outperforms GPT baselines for multiclass paraphasia detection.
Paper Structure (17 sections, 3 equations, 2 figures, 2 tables)

This paper contains 17 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Paraphasia Classification Models.
  • Figure 2: Utterance-level Binary F1-scores