Table of Contents
Fetching ...

A Pilot Study on Clinician-AI Collaboration in Diagnosing Depression from Speech

Kexin Feng, Theodora Chaspari

TL;DR

Qualitative analysis indicates the potential of integrating such systems into the current diagnostic and screening workflow, but also highlights existing limitations including clinicians' reduced familiarity with AI/ML systems and the need for user-friendly and intuitive visualizations of speech information.

Abstract

This study investigates clinicians' perceptions and attitudes toward an assistive artificial intelligence (AI) system that employs a speech-based explainable ML algorithm for detecting depression. The AI system detects depression from vowel-based spectrotemporal variations of speech and generates explanations through explainable AI (XAI) methods. It further provides decisions and explanations at various temporal granularities, including utterance groups, individual utterances, and within each utterance. A small-scale user study was conducted to evaluate users' perceived usability of the system, trust in the system, and perceptions of design factors associated with several elements of the system. Quantitative and qualitative analysis of the collected data indicates both positive and negative aspects that influence clinicians' perception toward the AI. Results from quantitative analysis indicate that providing more AI explanations enhances user trust but also increases system complexity. Qualitative analysis indicates the potential of integrating such systems into the current diagnostic and screening workflow, but also highlights existing limitations including clinicians' reduced familiarity with AI/ML systems and the need for user-friendly and intuitive visualizations of speech information.

A Pilot Study on Clinician-AI Collaboration in Diagnosing Depression from Speech

TL;DR

Qualitative analysis indicates the potential of integrating such systems into the current diagnostic and screening workflow, but also highlights existing limitations including clinicians' reduced familiarity with AI/ML systems and the need for user-friendly and intuitive visualizations of speech information.

Abstract

This study investigates clinicians' perceptions and attitudes toward an assistive artificial intelligence (AI) system that employs a speech-based explainable ML algorithm for detecting depression. The AI system detects depression from vowel-based spectrotemporal variations of speech and generates explanations through explainable AI (XAI) methods. It further provides decisions and explanations at various temporal granularities, including utterance groups, individual utterances, and within each utterance. A small-scale user study was conducted to evaluate users' perceived usability of the system, trust in the system, and perceptions of design factors associated with several elements of the system. Quantitative and qualitative analysis of the collected data indicates both positive and negative aspects that influence clinicians' perception toward the AI. Results from quantitative analysis indicate that providing more AI explanations enhances user trust but also increases system complexity. Qualitative analysis indicates the potential of integrating such systems into the current diagnostic and screening workflow, but also highlights existing limitations including clinicians' reduced familiarity with AI/ML systems and the need for user-friendly and intuitive visualizations of speech information.

Paper Structure

This paper contains 24 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: A visualization of our designed interface for speech AI depression identification model.
  • Figure 2: A flowchart demonstrating the study protocol: interact with the system, listen to audio, and make decisions on depression.
  • Figure 3: Average score of System Usability Scale ('SUS'), Merit Scale ('Trust'), Interface design survey ('Design'), and Between-audio survey ('Between') between conditions. 'Trust' survey does not apply to condition 1, and some 'Design' questions do not apply to conditions 1 and 2, as relevant components are not included on the interface. The surveys are on different scales and not directly comparable. Summary of each question is provided, a higher value means more agree with the question summary.