Table of Contents
Fetching ...

Soft-Weighted CrossEntropy Loss for Continous Alzheimer's Disease Detection

Xiaohui Zhang, Wenjie Fu, Mangui Liang

TL;DR

This work targets early detection of Alzheimer's disease from speech by combining a self-supervised Wav2vec2 feature extractor (XLSR-53) with downstream classifiers (GRU and A-CNN) and a tailored Soft-Weighted CrossEntropy loss. The authors fine-tune the pre-trained model on the NCMMSC2021 dataset and investigate a freezing schedule, finding that freezing for about $1000$ steps yields robust convergence. They introduce the loss $L_{swce}=-\sum_i w_i p(x_i) \log q(x_i)$ with $w_i=\exp(- \frac{\sum_{j \neq m} (p[m]-p[j])}{N})$, which emphasizes the ground-truth probability over incorrect predictions and improves generalization. Empirical results show that pre-trained features outperform handcrafted features, and SWCE provides about a 5 percentage-point improvement on test accuracy, while reducing the gap between validation and test performance on the NCMMSC2021 Chinese AD recognition dataset. Overall, the study demonstrates the value of self-supervised speech representations and loss shaping for medical audio tasks with limited labeled data.

Abstract

Alzheimer's disease is a common cognitive disorder in the elderly. Early and accurate diagnosis of Alzheimer's disease (AD) has a major impact on the progress of research on dementia. At present, researchers have used machine learning methods to detect Alzheimer's disease from the speech of participants. However, the recognition accuracy of current methods is unsatisfactory, and most of them focus on using low-dimensional handcrafted features to extract relevant information from audios. This paper proposes an Alzheimer's disease detection system based on the pre-trained framework Wav2vec 2.0 (Wav2vec2). In addition, by replacing the loss function with the Soft-Weighted CrossEntropy loss function, we achieved 85.45\% recognition accuracy on the same test dataset.

Soft-Weighted CrossEntropy Loss for Continous Alzheimer's Disease Detection

TL;DR

This work targets early detection of Alzheimer's disease from speech by combining a self-supervised Wav2vec2 feature extractor (XLSR-53) with downstream classifiers (GRU and A-CNN) and a tailored Soft-Weighted CrossEntropy loss. The authors fine-tune the pre-trained model on the NCMMSC2021 dataset and investigate a freezing schedule, finding that freezing for about steps yields robust convergence. They introduce the loss with , which emphasizes the ground-truth probability over incorrect predictions and improves generalization. Empirical results show that pre-trained features outperform handcrafted features, and SWCE provides about a 5 percentage-point improvement on test accuracy, while reducing the gap between validation and test performance on the NCMMSC2021 Chinese AD recognition dataset. Overall, the study demonstrates the value of self-supervised speech representations and loss shaping for medical audio tasks with limited labeled data.

Abstract

Alzheimer's disease is a common cognitive disorder in the elderly. Early and accurate diagnosis of Alzheimer's disease (AD) has a major impact on the progress of research on dementia. At present, researchers have used machine learning methods to detect Alzheimer's disease from the speech of participants. However, the recognition accuracy of current methods is unsatisfactory, and most of them focus on using low-dimensional handcrafted features to extract relevant information from audios. This paper proposes an Alzheimer's disease detection system based on the pre-trained framework Wav2vec 2.0 (Wav2vec2). In addition, by replacing the loss function with the Soft-Weighted CrossEntropy loss function, we achieved 85.45\% recognition accuracy on the same test dataset.
Paper Structure (7 sections, 8 equations, 1 figure, 3 tables)

This paper contains 7 sections, 8 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The architecture of the network with pre-trained model and downstream model.