Table of Contents
Fetching ...

Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews

Wen Wu, Chao Zhang, Philip C. Woodland

TL;DR

This work tackles reliable detection of Alzheimer's disease and depression from clinical interviews by introducing a Bayesian confidence-estimation framework that uses a dynamic Dirichlet prior to capture second-order uncertainty in predictions. A neural network predicts Dirichlet hyperparameters, and the model is trained via Bayes-risk optimization with a KL regularizer, yielding predictive distributions whose expectations provide calibrated class probabilities. Evaluations on the ADReSS and DAIC-WOZ datasets show the method improves both accuracy (e.g., AD F1=0.807, Acc=0.800) and calibration (lower ECE, higher NCE) compared to baselines, and the approach supports a reject option that increases prediction reliability at higher confidence thresholds. The approach promises more trustworthy automatic diagnostics and could extend to other input modalities beyond speech.

Abstract

Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model predictions and helps reduce the risk of misdiagnosis. This paper investigates confidence estimation for automatic detection of AD and depression based on clinical interviews. A novel Bayesian approach is proposed which uses a dynamic Dirichlet prior distribution to model the second-order probability of the predictive distribution. Experimental results on the publicly available ADReSS and DAIC-WOZ datasets demonstrate that the proposed method outperforms a range of baselines for both classification accuracy and confidence estimation.

Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews

TL;DR

This work tackles reliable detection of Alzheimer's disease and depression from clinical interviews by introducing a Bayesian confidence-estimation framework that uses a dynamic Dirichlet prior to capture second-order uncertainty in predictions. A neural network predicts Dirichlet hyperparameters, and the model is trained via Bayes-risk optimization with a KL regularizer, yielding predictive distributions whose expectations provide calibrated class probabilities. Evaluations on the ADReSS and DAIC-WOZ datasets show the method improves both accuracy (e.g., AD F1=0.807, Acc=0.800) and calibration (lower ECE, higher NCE) compared to baselines, and the approach supports a reject option that increases prediction reliability at higher confidence thresholds. The approach promises more trustworthy automatic diagnostics and could extend to other input modalities beyond speech.

Abstract

Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model predictions and helps reduce the risk of misdiagnosis. This paper investigates confidence estimation for automatic detection of AD and depression based on clinical interviews. A novel Bayesian approach is proposed which uses a dynamic Dirichlet prior distribution to model the second-order probability of the predictive distribution. Experimental results on the publicly available ADReSS and DAIC-WOZ datasets demonstrate that the proposed method outperforms a range of baselines for both classification accuracy and confidence estimation.
Paper Structure (16 sections, 9 equations, 4 figures, 4 tables)

This paper contains 16 sections, 9 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of the modelling process.
  • Figure 2: Illustration of the model structure.
  • Figure 3: Comparison to the baselines in terms of AUROC and AUPRC for AD detection. The average of five runs is plotted along with standard error as error bars.
  • Figure 4: Comparison to the baselines in terms of AUROC and AUPRC for depression detection.