Sparse-Input Neural Network using Group Concave Regularization
Bin Luo, Susan Halabi
TL;DR
The paper tackles high-dimensional predictive modeling by jointly selecting input features and estimating nonlinear functions via sparse-input neural networks with group concave regularization. It introduces a ridge-stabilized objective that applies a concave group penalty to each input-node's outgoing weights, coupled with a backward path-wise optimization to produce stable solution paths. The authors establish non-asymptotic estimation and prediction guarantees and prove an oracle property under standard high-dimensional conditions, with extensive simulations and real-data applications (continuous, binary, and time-to-event outcomes) demonstrating improved feature selection and competitive predictive performance. The work has practical implications for interpretable nonlinear modeling in HDDA, offering a computationally efficient path-wise training strategy and solid theoretical backing for variable selection consistency in neural networks.
Abstract
Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available sample size. In this article, we investigate the problem of feature selection in neural networks. Although the group least absolute shrinkage and selection operator (LASSO) has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. We provide a rigorous theoretical analysis of the proposed framework, establishing finite-sample guarantees for both variable selection consistency and prediction accuracy. These results are supported by extensive simulation studies and real data applications, which demonstrate the finite-sample performance of the estimator in feature selection and prediction across continuous, binary, and time-to-event outcomes.
