Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

Mary Paterson; James Moor; Luisa Cutillo

Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

Mary Paterson, James Moor, Luisa Cutillo

TL;DR

This scoping review addresses rising throat cancer incidence by evaluating the feasibility of detecting throat cancer from speech using ML/AI. It systematically searches Scopus, Web of Science, and PubMed up to 2024, identifying 27 eligible studies that perform binary or multi-class classification on speech data. Neural networks and MFCC features dominate the literature, but results vary widely and are often hampered by small, heterogeneous datasets and limited open science. The authors emphasize the need for standardized methodologies, external validation, and publicly available code/datasets to enable reproducibility and clinical translation, highlighting TRIPOD-AI as a useful benchmark for reporting quality.

Abstract

Introduction: Cases of throat cancer are rising worldwide. With survival decreasing significantly at later stages, early detection is vital. Artificial intelligence (AI) and machine learning (ML) have the potential to detect throat cancer from patient speech, facilitating earlier diagnosis and reducing the burden on overstretched healthcare systems. However, no comprehensive review has explored the use of AI and ML for detecting throat cancer from speech. This review aims to fill this gap by evaluating how these technologies perform and identifying issues that need to be addressed in future research. Materials and Methods: We conducted a scoping literature review across three databases: Scopus, Web of Science, and PubMed. We included articles that classified speech using machine learning and specified the inclusion of throat cancer patients in their data. Articles were categorized based on whether they performed binary or multi-class classification. Results: We found 27 articles fitting our inclusion criteria, 12 performing binary classification, 13 performing multi-class classification, and two that do both binary and multiclass classification. The most common classification method used was neural networks, and the most frequently extracted feature was mel-spectrograms. We also documented pre-processing methods and classifier performance. We compared each article against the TRIPOD-AI checklist, which showed a significant lack of open science, with only one article sharing code and only three using open-access data. Conclusion: Open-source code is essential for external validation and further development in this field. Our review indicates that no single method or specific feature consistently outperforms others in detecting throat cancer from speech. Future research should focus on standardizing methodologies and improving the reproducibility of results.

Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

TL;DR

Abstract

Paper Structure (20 sections, 5 equations, 12 figures, 4 tables)

This paper contains 20 sections, 5 equations, 12 figures, 4 tables.

Introduction
Background
Related Work
Objective
Methods
Search Strategy
Inclusion and Exclusion Criteria
Study Selection
Data Extraction
Results and Discussion
Overview
Datasets
Preprocessing
RQ1 - Classification Methods
RQ2 - Feature Extraction
...and 5 more sections

Figures (12)

Figure 1: The Prisma diagram shows the steps taken to obtain relevant articles for this literature search.
Figure 2: The typical pipeline for classifying pathological speech using machine learning. Both preprocessing and feature extraction are optional steps not performed by all articles.
Figure 3: A comparison of the results obtained when evaluating using cross-validation, holdout, and external test sets. The bar represents the average accuracy, and the error bars are the minimum and maximum values.
Figure 4: The percentage of samples for healthy, cancer, and non-cancer pathology(s) in each article. ben_aicha_cancer_2016 removed as exact counts were not provided.
Figure 5: A comparison of the results obtained when signals were preprocessed and when they were not. The bar represents the average accuracy, and the error bars are the minimum and maximum values.
...and 7 more figures

Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

TL;DR

Abstract

Detecting Throat Cancer from Speech Signals using Machine Learning: A Scoping Literature Review

Authors

TL;DR

Abstract

Table of Contents

Figures (12)