How to evaluate your medical time series classification?

Yihe Wang; Taida Li; Yujun Yan; Wenzhan Song; Xiang Zhang

How to evaluate your medical time series classification?

Yihe Wang, Taida Li, Yujun Yan, Wenzhan Song, Xiang Zhang

TL;DR

This comprehensive analysis aims to establish clearer guidelines for evaluating MedTS models in different healthcare applications by demonstrating step-by-step how subject-dependent utilizes subject-specific features as a shortcut for classification and leads to a deceptive high performance, suggesting that the subject-independent setup is more precise and practicable evaluation setup in real-world.

Abstract

Medical time series (MedTS) play a critical role in many healthcare applications, such as vital sign monitoring and the diagnosis of brain and heart diseases. However, the existence of subject-specific features poses unique challenges in MedTS evaluation. Inappropriate evaluation setups that either exploit or overlook these features can lead to artificially inflated classification performance (by up to 50% in accuracy on ADFTD dataset): this concern has received little attention in current research. Here, we categorize the existing evaluation setups into two primary categories: subject-dependent and subject-independent. We show the subject-independent setup is more appropriate for different datasets and tasks. Our theoretical analysis explores the feature components of MedTS, examining how different evaluation setups influence the features that a model learns. Through experiments on six datasets (spanning EEG, ECG, and fNIRS modalities) using four different methods, we demonstrate step-by-step how subject-dependent utilizes subject-specific features as a shortcut for classification and leads to a deceptive high performance, suggesting that the subject-independent setup is more precise and practicable evaluation setup in real-world. This comprehensive analysis aims to establish clearer guidelines for evaluating MedTS models in different healthcare applications. Code to reproduce this work in \url{https://github.com/DL4mHealth/MedTS_Evaluation}.

How to evaluate your medical time series classification?

TL;DR

Abstract

Paper Structure (28 sections, 1 equation, 2 figures, 7 tables)

This paper contains 28 sections, 1 equation, 2 figures, 7 tables.

Introduction
Taxonomy of MedTS Datasets and Evaluation Setups
Types of Medical Time Series Dataset
MedTS Evaluation Setups
Method
Notations and Assumptions of Type-III MedTS Dataset
Feature Components Utilized in Subject-Dependent vs Independent
New Experimental Setups to Validate Assumptions
Experiments
Results of Subject-Dependent and Subject-Independent On Type-III MedTS
Results of Three New Experimental Setups on Type-III MedTS
Results of Subject-Discrimination
Results of Random-Label Subject-Dependent
Results of Random-Label Subject-Independent
Results of Different Setups on Type-II MedTS
...and 13 more sections

Figures (2)

Figure 1: Types of MedTS Datasets. S and C denote subject and class, respectively.
Figure 2: Types of MedTS Evaluation Setups. (a) This diagram shows the two main evaluation setups and their sub-types, (b) This figure adopted from wang2024contrast shows the differences between the two main setups: subject-dependent and subject-independent.

How to evaluate your medical time series classification?

TL;DR

Abstract

How to evaluate your medical time series classification?

Authors

TL;DR

Abstract

Table of Contents

Figures (2)