Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

Mahdi Amiri; Hatef Otroshi Shahreza; Ina Kodrasi

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

Mahdi Amiri, Hatef Otroshi Shahreza, Ina Kodrasi

TL;DR

This paper addresses the interpretability gap in automatic pathological speech detection by leveraging multimodal LLMs. It investigates ChatGPT-4o in a few-shot in-context-learning setup using STFT magnitude spectrograms to detect dysarthria, producing not only classifications but also explanations. Evaluations on the Noise Reduced UA-Speech Dysarthria dataset show competitive performance relative to a SOTA CNN baseline, with the added advantage of interpretability through generated explanations. The study also conducts ablations on system prompts and input modalities, highlighting the potential and current limitations of multimodal LLMs for clinically relevant, explainable speech pathology detection and outlining directions for future improvements in explanation quality and prompt design.

Abstract

Automatic pathological speech detection approaches have shown promising results, gaining attention as potential diagnostic tools alongside costly traditional methods. While these approaches can achieve high accuracy, their lack of interpretability limits their applicability in clinical practice. In this paper, we investigate the use of multimodal Large Language Models (LLMs), specifically ChatGPT-4o, for automatic pathological speech detection in a few-shot in-context learning setting. Experimental results show that this approach not only delivers promising performance but also provides explanations for its decisions, enhancing model interpretability. To further understand its effectiveness, we conduct an ablation study to analyze the impact of different factors, such as input type and system prompts, on the final results. Our findings highlight the potential of multimodal LLMs for further exploration and advancement in automatic pathological speech detection.

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

TL;DR

Abstract

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)