Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

Mikayla Calitis

Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

Mikayla Calitis

TL;DR

This study addresses identifying osteoporosis risk factors through unsupervised learning on large electronic health record data. It introduces the CLustering Iterations Framework (CLIF) and principal feature identification via Wasserstein distance $W$, combined with ANOVA and ablation for feature selection. Applied to NHANES data with $N=101{,}316$, it finds dense clusters (density $≥0.85$) across five iterations of HDBSCAN, differentiating clusters by features such as age, fracture history, daily corticosteroid use, and parental osteoporosis, with age repeatedly emerging as a key factor. The results support some established associations while challenging others, demonstrating the potential of iterative, unsupervised clustering to reveal robust risk signatures and guide future validation in osteoporosis research.

Abstract

In this study, the reliability of identified risk factors associated with osteoporosis is investigated using a new clustering-based method on electronic medical records. This study proposes utilizing a new CLustering Iterations Framework (CLIF) that includes an iterative clustering framework that can adapt any of the following three components: clustering, feature selection, and principal feature identification. The study proposes using Wasserstein distance to identify principal features, borrowing concepts from the optimal transport theory. The study also suggests using a combination of ANOVA and ablation tests to select influential features from a data set. Some risk factors presented in existing works are endorsed by our identified significant clusters, while the reliability of some other risk factors is weakened.

Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

TL;DR

, combined with ANOVA and ablation for feature selection. Applied to NHANES data with

, it finds dense clusters (density

) across five iterations of HDBSCAN, differentiating clusters by features such as age, fracture history, daily corticosteroid use, and parental osteoporosis, with age repeatedly emerging as a key factor. The results support some established associations while challenging others, demonstrating the potential of iterative, unsupervised clustering to reveal robust risk signatures and guide future validation in osteoporosis research.

Abstract

Paper Structure (15 sections, 10 figures, 4 algorithms)

This paper contains 15 sections, 10 figures, 4 algorithms.

Introduction
Related Works
Methods
Experiments
Data
Experimental Methods
Data Collection
Data Preprocessing
Application of Methods
Experimental Results
Conclusions
Discussion
Conclusion
Future Works
Acknowledgments

Figures (10)

Figure 1: Distribution of Participant Ages
Figure 2: Distribution of Participant Ages by Gender
Figure 3: Percentages of Participant Ethnic Background
Figure 4: Age Distribution by Osteoporosis Diagnosis
Figure 5: Average Age of Osteoporosis Patients Grouped by Gender and Diagnosis
...and 5 more figures

Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

TL;DR

Abstract

Risk Factor Identification In Osteoporosis Using Unsupervised Machine Learning Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (10)