Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data
Dohyun Bu, Jisoo Han, Soohwa Kwon, Yulim So, Jong-Seok Lee
TL;DR
The paper tackles personalized health prediction from multimodal lifelog data by proposing SASL, an interpretable framework that separates global and subject-specific effects within a sparse linear model. It combines backward elimination via nested $F$-tests with a regression-then-thresholding strategy to optimize macro-F1 for ordinal targets, and enhances challenging predictions through confidence-based LightGBM gating. Evaluated on the CH-2025 lifelog dataset, SASL achieves competitive accuracy with far fewer parameters and greater transparency, achieving macro-F1 gains up to 0.6387 on a public test split. The method offers clinically actionable insights by exposing subject-level adjustments and interpretable feature effects while maintaining competitive performance against black-box approaches.
Abstract
Improved prediction of personalized health outcomes -- such as sleep quality and stress -- from multimodal lifelog data could have meaningful clinical and practical implications. However, state-of-the-art models, primarily deep neural networks and gradient-boosted ensembles, sacrifice interpretability and fail to adequately address the significant inter-individual variability inherent in lifelog data. To overcome these challenges, we propose the Subject-Adaptive Sparse Linear (SASL) framework, an interpretable modeling approach explicitly designed for personalized health prediction. SASL integrates ordinary least squares regression with subject-specific interactions, systematically distinguishing global from individual-level effects. We employ an iterative backward feature elimination method based on nested $F$-tests to construct a sparse and statistically robust model. Additionally, recognizing that health outcomes often represent discretized versions of continuous processes, we develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets. For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating, enhancing accuracy without compromising interpretability. Evaluations conducted on the CH-2025 dataset -- which comprises roughly 450 daily observations from ten subjects -- demonstrate that the hybrid SASL-LightGBM framework achieves predictive performance comparable to that of sophisticated black-box methods, but with significantly fewer parameters and substantially greater transparency, thus providing clear and actionable insights for clinicians and practitioners.
