Table of Contents
Fetching ...

Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data

Dohyun Bu, Jisoo Han, Soohwa Kwon, Yulim So, Jong-Seok Lee

TL;DR

The paper tackles personalized health prediction from multimodal lifelog data by proposing SASL, an interpretable framework that separates global and subject-specific effects within a sparse linear model. It combines backward elimination via nested $F$-tests with a regression-then-thresholding strategy to optimize macro-F1 for ordinal targets, and enhances challenging predictions through confidence-based LightGBM gating. Evaluated on the CH-2025 lifelog dataset, SASL achieves competitive accuracy with far fewer parameters and greater transparency, achieving macro-F1 gains up to 0.6387 on a public test split. The method offers clinically actionable insights by exposing subject-level adjustments and interpretable feature effects while maintaining competitive performance against black-box approaches.

Abstract

Improved prediction of personalized health outcomes -- such as sleep quality and stress -- from multimodal lifelog data could have meaningful clinical and practical implications. However, state-of-the-art models, primarily deep neural networks and gradient-boosted ensembles, sacrifice interpretability and fail to adequately address the significant inter-individual variability inherent in lifelog data. To overcome these challenges, we propose the Subject-Adaptive Sparse Linear (SASL) framework, an interpretable modeling approach explicitly designed for personalized health prediction. SASL integrates ordinary least squares regression with subject-specific interactions, systematically distinguishing global from individual-level effects. We employ an iterative backward feature elimination method based on nested $F$-tests to construct a sparse and statistically robust model. Additionally, recognizing that health outcomes often represent discretized versions of continuous processes, we develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets. For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating, enhancing accuracy without compromising interpretability. Evaluations conducted on the CH-2025 dataset -- which comprises roughly 450 daily observations from ten subjects -- demonstrate that the hybrid SASL-LightGBM framework achieves predictive performance comparable to that of sophisticated black-box methods, but with significantly fewer parameters and substantially greater transparency, thus providing clear and actionable insights for clinicians and practitioners.

Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data

TL;DR

The paper tackles personalized health prediction from multimodal lifelog data by proposing SASL, an interpretable framework that separates global and subject-specific effects within a sparse linear model. It combines backward elimination via nested -tests with a regression-then-thresholding strategy to optimize macro-F1 for ordinal targets, and enhances challenging predictions through confidence-based LightGBM gating. Evaluated on the CH-2025 lifelog dataset, SASL achieves competitive accuracy with far fewer parameters and greater transparency, achieving macro-F1 gains up to 0.6387 on a public test split. The method offers clinically actionable insights by exposing subject-level adjustments and interpretable feature effects while maintaining competitive performance against black-box approaches.

Abstract

Improved prediction of personalized health outcomes -- such as sleep quality and stress -- from multimodal lifelog data could have meaningful clinical and practical implications. However, state-of-the-art models, primarily deep neural networks and gradient-boosted ensembles, sacrifice interpretability and fail to adequately address the significant inter-individual variability inherent in lifelog data. To overcome these challenges, we propose the Subject-Adaptive Sparse Linear (SASL) framework, an interpretable modeling approach explicitly designed for personalized health prediction. SASL integrates ordinary least squares regression with subject-specific interactions, systematically distinguishing global from individual-level effects. We employ an iterative backward feature elimination method based on nested -tests to construct a sparse and statistically robust model. Additionally, recognizing that health outcomes often represent discretized versions of continuous processes, we develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets. For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating, enhancing accuracy without compromising interpretability. Evaluations conducted on the CH-2025 dataset -- which comprises roughly 450 daily observations from ten subjects -- demonstrate that the hybrid SASL-LightGBM framework achieves predictive performance comparable to that of sophisticated black-box methods, but with significantly fewer parameters and substantially greater transparency, thus providing clear and actionable insights for clinicians and practitioners.

Paper Structure

This paper contains 12 sections, 7 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Toy regression example after backward feature elimination. SASL (red solid) preserves a single subject-specific slope $\beta_{3}$ while discarding the remaining subject ID interactions, striking a compromise between the under-fitted global model $y=\mu+\beta x$ (brown dash-dot) and the over-parameterized full model (orange dashed dot). Here $\mu$ and $\beta$ denote the global intercept and coefficient shared by all subjects, whereas $\mu_{i}$ and $\beta_{i}$ are the additional intercept and slope for subject $i$. The term $x^{(i)}$ represents the interaction between the feature $x$ and the one-hot indicator for subject $i$, meaning that it equals $x$ for that subject and $0$ for others.
  • Figure 2: Z-score profiles for the two disagreement groups.
  • Figure 3: Linear coefficient profiles for each target.