biniLasso: Automated cut-point detection via sparse cumulative binarization
Abdollah Safari, Hamed Halisaz, Peter Loewen
TL;DR
This work tackles data-driven cut-point detection in high-dimensional survival analysis by introducing biniLasso, which uses cumulative binarization to model continuous predictors within a Cox framework, and its sparse variant miniLasso that enforces sparsity via uniLasso with sign-consistency. The methods enable detection of multiple cut-points per feature while delivering substantial computational gains (2–8× faster than the state-of-the-art binacox) and competitive predictive accuracy, as shown in extensive simulations. In three TCGA cancer datasets, biniLasso and miniLasso identify meaningful cut-points, offer stable risk stratification, and often surpass binacox in interpretability and performance (AIC, IBS, C-index) when evaluated against CGAM and continuous models. The approach generalizes to other GLMs, providing a practical, interpretable toolkit for high-dimensional prognostic modeling and risk stratification.
Abstract
We present biniLasso and its sparse variant (sparse biniLasso), novel methods for prognostic analysis of high-dimensional survival data that enable detection of multiple cut-points per feature. Our approach leverages the Cox proportional hazards model with two key innovations: (1) a cumulative binarization scheme with $L_1$-penalized coefficients operating on context-dependent cut-point candidates, and (2) for sparse biniLasso, additional uniLasso regularization to enforce sparsity while preserving univariate coefficient patterns. These innovations yield substantially improved interpretability, computational efficiency (4-11x faster than existing approaches), and prediction performance. Through extensive simulations, we demonstrate superior performance in cut-point detection, particularly in high-dimensional settings. Application to three genomic cancer datasets from TCGA confirms the methods' practical utility, with both variants showing enhanced risk prediction accuracy compared to conventional techniques.
