Table of Contents
Fetching ...

Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

Mikayla Calitis

TL;DR

This study addresses identifying osteoporosis risk factors through unsupervised learning on large electronic health record data. It introduces the CLustering Iterations Framework (CLIF) and principal feature identification via Wasserstein distance $W$, combined with ANOVA and ablation for feature selection. Applied to NHANES data with $N=101{,}316$, it finds dense clusters (density $≥0.85$) across five iterations of HDBSCAN, differentiating clusters by features such as age, fracture history, daily corticosteroid use, and parental osteoporosis, with age repeatedly emerging as a key factor. The results support some established associations while challenging others, demonstrating the potential of iterative, unsupervised clustering to reveal robust risk signatures and guide future validation in osteoporosis research.

Abstract

In this study, the reliability of identified risk factors associated with osteoporosis is investigated using a new clustering-based method on electronic medical records. This study proposes utilizing a new CLustering Iterations Framework (CLIF) that includes an iterative clustering framework that can adapt any of the following three components: clustering, feature selection, and principal feature identification. The study proposes using Wasserstein distance to identify principal features, borrowing concepts from the optimal transport theory. The study also suggests using a combination of ANOVA and ablation tests to select influential features from a data set. Some risk factors presented in existing works are endorsed by our identified significant clusters, while the reliability of some other risk factors is weakened.

Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

TL;DR

This study addresses identifying osteoporosis risk factors through unsupervised learning on large electronic health record data. It introduces the CLustering Iterations Framework (CLIF) and principal feature identification via Wasserstein distance , combined with ANOVA and ablation for feature selection. Applied to NHANES data with , it finds dense clusters (density ) across five iterations of HDBSCAN, differentiating clusters by features such as age, fracture history, daily corticosteroid use, and parental osteoporosis, with age repeatedly emerging as a key factor. The results support some established associations while challenging others, demonstrating the potential of iterative, unsupervised clustering to reveal robust risk signatures and guide future validation in osteoporosis research.

Abstract

In this study, the reliability of identified risk factors associated with osteoporosis is investigated using a new clustering-based method on electronic medical records. This study proposes utilizing a new CLustering Iterations Framework (CLIF) that includes an iterative clustering framework that can adapt any of the following three components: clustering, feature selection, and principal feature identification. The study proposes using Wasserstein distance to identify principal features, borrowing concepts from the optimal transport theory. The study also suggests using a combination of ANOVA and ablation tests to select influential features from a data set. Some risk factors presented in existing works are endorsed by our identified significant clusters, while the reliability of some other risk factors is weakened.
Paper Structure (15 sections, 10 figures, 4 algorithms)

This paper contains 15 sections, 10 figures, 4 algorithms.

Figures (10)

  • Figure 1: Distribution of Participant Ages
  • Figure 2: Distribution of Participant Ages by Gender
  • Figure 3: Percentages of Participant Ethnic Background
  • Figure 4: Age Distribution by Osteoporosis Diagnosis
  • Figure 5: Average Age of Osteoporosis Patients Grouped by Gender and Diagnosis
  • ...and 5 more figures