Hybrid(Penalized Regression and MLP) Models for Outcome Prediction in HDLSS Health Data
Mithra D K
TL;DR
This work tackles diabetes prediction in HDLSS health data by pairing penalized regression with a compact MLP. It develops a hybrid framework that uses stable, sparse linear feature selection to constrain a neural network, addressing overfitting and instability common in high-dimensional settings. Through a two-stage evaluation using NHANES data, the refined pipeline achieves higher recall and F1 while maintaining AUC similar to strong linear baselines, with interpretable feature importance dominated by cardiometabolic indicators. The approach provides a practical blueprint for integrating linear stability with nonlinear modeling in real-world, high-dimensional health datasets.
Abstract
I present an application of established machine learning techniques to NHANES health survey data for predicting diabetes status. I compare baseline models (logistic regression, random forest, XGBoost) with a hybrid approach that uses an XGBoost feature encoder and a lightweight multilayer perceptron (MLP) head. Experiments show the hybrid model attains improved AUC and balanced accuracy compared to baselines on the processed NHANES subset. I release code and reproducible scripts to encourage replication.
