Table of Contents
Fetching ...

Unified Pathological Speech Analysis with Prompt Tuning

Fei Yang, Xuenan Xu, Mengyue Wu, Kai Yu

TL;DR

This work proposes a unified pathological speech analysis system for as many as three diseases with the prompt tuning technique, which leverages a pre-trained spoken language model and demonstrates strong performance across multiple disorders while only fine-tuning a fraction of the parameters.

Abstract

Pathological speech analysis has been of interest in the detection of certain diseases like depression and Alzheimer's disease and attracts much interest from researchers. However, previous pathological speech analysis models are commonly designed for a specific disease while overlooking the connection between diseases, which may constrain performance and lower training efficiency. Instead of fine-tuning deep models for different tasks, prompt tuning is a much more efficient training paradigm. We thus propose a unified pathological speech analysis system for as many as three diseases with the prompt tuning technique. This system uses prompt tuning to adjust only a small part of the parameters to detect different diseases from speeches of possible patients. Our system leverages a pre-trained spoken language model and demonstrates strong performance across multiple disorders while only fine-tuning a fraction of the parameters. This efficient training approach leads to faster convergence and improved F1 scores by allowing knowledge to be shared across tasks. Our experiments on Alzheimer's disease, Depression, and Parkinson's disease show competitive results, highlighting the effectiveness of our method in pathological speech analysis.

Unified Pathological Speech Analysis with Prompt Tuning

TL;DR

This work proposes a unified pathological speech analysis system for as many as three diseases with the prompt tuning technique, which leverages a pre-trained spoken language model and demonstrates strong performance across multiple disorders while only fine-tuning a fraction of the parameters.

Abstract

Pathological speech analysis has been of interest in the detection of certain diseases like depression and Alzheimer's disease and attracts much interest from researchers. However, previous pathological speech analysis models are commonly designed for a specific disease while overlooking the connection between diseases, which may constrain performance and lower training efficiency. Instead of fine-tuning deep models for different tasks, prompt tuning is a much more efficient training paradigm. We thus propose a unified pathological speech analysis system for as many as three diseases with the prompt tuning technique. This system uses prompt tuning to adjust only a small part of the parameters to detect different diseases from speeches of possible patients. Our system leverages a pre-trained spoken language model and demonstrates strong performance across multiple disorders while only fine-tuning a fraction of the parameters. This efficient training approach leads to faster convergence and improved F1 scores by allowing knowledge to be shared across tasks. Our experiments on Alzheimer's disease, Depression, and Parkinson's disease show competitive results, highlighting the effectiveness of our method in pathological speech analysis.

Paper Structure

This paper contains 11 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The overview of our proposed system. We use prompts to guide the unified model to detect different diseases.
  • Figure 2: Schematic diagram of the unified pathological speech analysis framework. We take speech from datasets covering different diseases, languages, class labels and leverage these metadata as trainable prompts.
  • Figure 3: We use prefix prompt to direct the unit language model's inference. Three main elements are involved: disease-specific, language, and class embeddings. Together, they offer a comprehensive context, allowing the model to produce task-specific and relevant outputs.