How to Count Coughs: An Event-Based Framework for Evaluating Automatic Cough Detection Algorithm Performance
Lara Orlandic, Jonathan Dan, Jerome Thevenot, Tomas Teijeiro, Alain Sauty, David Atienza
TL;DR
This paper addresses the mismatch between traditional sample-based cough-count metrics and clinically relevant endpoints for chronic cough monitoring with wearables. It introduces an event-based evaluation framework aligned with ERS cough-monitoring endpoints and a physiology-inspired post-processing step that converts sample-based predictions into individual cough events. The study demonstrates that conventional SB metrics are highly sensitive to dataset imbalance and window length, while EB metrics provide more clinically meaningful assessment, with EB achieving a higher effective detection rate ($0.731$) than SB ($0.574$) and offering richer event structure information. It also provides open-source tools for times-based scoring and outlines the need for clinical, patient-population datasets to enable robust comparison across methods. Overall, the work delivers a clinically grounded methodology for evaluating cough-counting algorithms in long-term wearable monitoring and suggests extensions to other respiratory symptoms.
Abstract
Chronic cough disorders are widespread and challenging to assess because they rely on subjective patient questionnaires about cough frequency. Wearable devices running Machine Learning (ML) algorithms are promising for quantifying daily coughs, providing clinicians with objective metrics to track symptoms and evaluate treatments. However, there is a mismatch between state-of-the-art metrics for cough counting algorithms and the information relevant to clinicians. Most works focus on distinguishing cough from non-cough samples, which does not directly provide clinically relevant outcomes such as the number of cough events or their temporal patterns. In addition, typical metrics such as specificity and accuracy can be biased by class imbalance. We propose using event-based evaluation metrics aligned with clinical guidelines on significant cough counting endpoints. We use an ML classifier to illustrate the shortcomings of traditional sample-based accuracy measurements, highlighting their variance due to dataset class imbalance and sample window length. We also present an open-source event-based evaluation framework to test algorithm performance in identifying cough events and rejecting false positives. We provide examples and best practice guidelines in event-based cough counting as a necessary first step to assess algorithm performance with clinical relevance.
