Table of Contents
Fetching ...

Unified dimensionality reduction techniques in chronic liver disease detection

Anand Karna, Naina Khan, Rahul Rauniyar, Prashant Giridhar Shambharkar

TL;DR

This work tackles chronic liver disease detection using the Indian Liver Patient Dataset (ILPD) by integrating linear (LDA, FA) and non-linear (t-SNE, UMAP) dimensionality reduction with IQR-based outlier handling and oversampling. The proposed feature-extraction and integration pipeline, followed by scaling, yields a robust classification framework in which Random Forest achieves an average cross-validated accuracy of 98.31% and a train-test accuracy of 95.79%, outperforming alternative classifiers. The study demonstrates the practical value of combining complementary dimensionality-reduction techniques to improve predictive performance in healthcare, and it outlines directions for extending the approach with additional dimensionality-reduction methods and richer clinical data. These findings have potential implications for clinical decision support by enabling faster, more accurate screening of chronic liver disease using routinely collected laboratory features.

Abstract

Globally, chronic liver disease continues to be a major health concern that requires precise predictive models for prompt detection and treatment. Using the Indian Liver Patient Dataset (ILPD) from the University of California at Irvine's UCI Machine Learning Repository, a number of machine learning algorithms are investigated in this study. The main focus of our research is this dataset, which includes the medical records of 583 patients, 416 of whom have been diagnosed with liver disease and 167 of whom have not. There are several aspects to this work, including feature extraction and dimensionality reduction methods like Linear Discriminant Analysis (LDA), Factor Analysis (FA), t-distributed Stochastic Neighbour Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). The purpose of the study is to investigate how well these approaches work for converting high-dimensional datasets and improving prediction accuracy. To assess the prediction ability of the improved models, a number of classification methods were used, such as Multi-layer Perceptron, Random Forest, K-nearest neighbours, and Logistic Regression. Remarkably, the improved models performed admirably, with Random Forest having the highest accuracy of 98.31\% in 10-fold cross-validation and 95.79\% in train-test split evaluation. Findings offer important new perspectives on the choice and use of customized feature extraction and dimensionality reduction methods, which improve predictive models for patients with chronic liver disease.

Unified dimensionality reduction techniques in chronic liver disease detection

TL;DR

This work tackles chronic liver disease detection using the Indian Liver Patient Dataset (ILPD) by integrating linear (LDA, FA) and non-linear (t-SNE, UMAP) dimensionality reduction with IQR-based outlier handling and oversampling. The proposed feature-extraction and integration pipeline, followed by scaling, yields a robust classification framework in which Random Forest achieves an average cross-validated accuracy of 98.31% and a train-test accuracy of 95.79%, outperforming alternative classifiers. The study demonstrates the practical value of combining complementary dimensionality-reduction techniques to improve predictive performance in healthcare, and it outlines directions for extending the approach with additional dimensionality-reduction methods and richer clinical data. These findings have potential implications for clinical decision support by enabling faster, more accurate screening of chronic liver disease using routinely collected laboratory features.

Abstract

Globally, chronic liver disease continues to be a major health concern that requires precise predictive models for prompt detection and treatment. Using the Indian Liver Patient Dataset (ILPD) from the University of California at Irvine's UCI Machine Learning Repository, a number of machine learning algorithms are investigated in this study. The main focus of our research is this dataset, which includes the medical records of 583 patients, 416 of whom have been diagnosed with liver disease and 167 of whom have not. There are several aspects to this work, including feature extraction and dimensionality reduction methods like Linear Discriminant Analysis (LDA), Factor Analysis (FA), t-distributed Stochastic Neighbour Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). The purpose of the study is to investigate how well these approaches work for converting high-dimensional datasets and improving prediction accuracy. To assess the prediction ability of the improved models, a number of classification methods were used, such as Multi-layer Perceptron, Random Forest, K-nearest neighbours, and Logistic Regression. Remarkably, the improved models performed admirably, with Random Forest having the highest accuracy of 98.31\% in 10-fold cross-validation and 95.79\% in train-test split evaluation. Findings offer important new perspectives on the choice and use of customized feature extraction and dimensionality reduction methods, which improve predictive models for patients with chronic liver disease.
Paper Structure (14 sections, 20 equations, 13 figures, 4 tables)

This paper contains 14 sections, 20 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: A flow diagram depicting the methodology used in the study
  • Figure 2: ROC curves in 10-fold CV
  • Figure 3: ROC curves in Train-Test Split
  • Figure 4: Precision-Recall curves in 10-fold cross validation
  • Figure 5: Precision-Recall curves in Train-Test Split
  • ...and 8 more figures