Table of Contents
Fetching ...

Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

Masaya Mori, Yuto Omae, Yutaka Koyama, Kazuyuki Hara, Jun Toyotani, Yasuo Okumura, Hiroyuki Hao

TL;DR

The paper addresses diagnosing cardiomyopathy from scarce endomyocardial biopsy data by leveraging texture-based features and lightweight, non-deep-learning models. It systematically evaluates 39 texture features from FOS, GLDS, GLCM, GLRLM, ADF, and RDF, combined with feature selection and two-step dimensionality reduction to improve generalization on small samples. Using stratified five-fold cross-validation with nested Bayesian optimization, a support vector machine with a linear kernel coupled with FS+DC achieves the best test generalization (macro-F1 around 0.949), while dimensionality compression alone tends to overfit. The findings suggest that high-dimensional texture information can be informative for pathology classification when coupled with robust cross-validation and carefully staged dimensionality reduction, offering a pathway for rapid clinical adoption in data-limited settings.

Abstract

As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.

Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

TL;DR

The paper addresses diagnosing cardiomyopathy from scarce endomyocardial biopsy data by leveraging texture-based features and lightweight, non-deep-learning models. It systematically evaluates 39 texture features from FOS, GLDS, GLCM, GLRLM, ADF, and RDF, combined with feature selection and two-step dimensionality reduction to improve generalization on small samples. Using stratified five-fold cross-validation with nested Bayesian optimization, a support vector machine with a linear kernel coupled with FS+DC achieves the best test generalization (macro-F1 around 0.949), while dimensionality compression alone tends to overfit. The findings suggest that high-dimensional texture information can be informative for pathology classification when coupled with robust cross-validation and carefully staged dimensionality reduction, offering a pathway for rapid clinical adoption in data-limited settings.

Abstract

As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.

Paper Structure

This paper contains 16 sections, 6 equations, 7 figures, 4 tables, 3 algorithms.

Figures (7)

  • Figure 1: Graphical introduction.
  • Figure 2: Overview of stratified $K$-fold cross-validation with nested structure for $K=5$.
  • Figure 3: Histopathological images of the myocardium from disease cases, borderline cases, and normal cases in six subjects.
  • Figure 4: Box plots for disease, borderline, and normal cases across all texture features, and adjusted $p$-values for the three groups from statistical hypothesis testing (Features with significant differences are marked with an asterisk beside the variable name).
  • Figure : An algorithm for evaluating the predictive performance of a model design.
  • ...and 2 more figures