FH-TabNet: Multi-Class Familial Hypercholesterolemia Detection via a Multi-Stage Tabular Deep Learning
Sadaf Khademi, Zohreh Hajiakhondi, Golnaz Vaseghi, Nizal Sarrafzadegan, Arash Mohammadi
TL;DR
This paper tackles the underdiagnosis and binary limitation of FH detection by proposing FH-TabNet, a multi-stage TabNet-based framework that yields four-class FH staging (Definite, Probable, Possible, Unlikely) using EMR-derived features without genetic data. The model uses a two-stage strategy: Stage-1 binary separation of FH vs healthy, followed by Stage-2 parallel binary classifiers for refined subcategories, with per-sample feature attention and masking enhancing representation. 5-fold cross-validation demonstrates high accuracy and robust performance, particularly for low-prevalence classes, and outperforms traditional ML baselines and single-stage TabNet. The approach relies on accessible EMR and laboratory data, indicating potential for cost-effective, scalable FH screening in resource-limited settings. These findings support broader adoption of DL-based tabular models for genetic disorders where multi-class risk stratification is clinically valuable.
Abstract
Familial Hypercholesterolemia (FH) is a genetic disorder characterized by elevated levels of Low-Density Lipoprotein (LDL) cholesterol or its associated genes. Early-stage and accurate categorization of FH is of significance allowing for timely interventions to mitigate the risk of life-threatening conditions. Conventional diagnosis approach, however, is complex, costly, and a challenging interpretation task even for experienced clinicians resulting in high underdiagnosis rates. Although there has been a recent surge of interest in using Machine Learning (ML) models for early FH detection, existing solutions only consider a binary classification task solely using classical ML models. Despite its significance, application of Deep Learning (DL) for FH detection is in its infancy, possibly, due to categorical nature of the underlying clinical data. The paper addresses this gap by introducing the FH-TabNet, which is a multi-stage tabular DL network for multi-class (Definite, Probable, Possible, and Unlikely) FH detection. The FH-TabNet initially involves applying a deep tabular data learning architecture (TabNet) for primary categorization into healthy (Possible/Unlikely) and patient (Probable/Definite) classes. Subsequently, independent TabNet classifiers are applied to each subgroup, enabling refined classification. The model's performance is evaluated through 5-fold cross-validation illustrating superior performance in categorizing FH patients, particularly in the challenging low-prevalence subcategories.
