Logistic lasso regression with nearest neighbors for gradient-based dimension reduction
Touqeer Ahmad, François Portier, Gilles Stupfler
TL;DR
Let $\\pi(x)=P(Y=1|X=x)=g(\\beta^T x)$ with a $d$-dimensional central subspace spanned by the columns of $\\beta$. The paper develops a localized nearest-neighbor penalized logistic approach to estimate the gradient $\\nabla\\ell(x)$ of $\\ell(x)=\\log(\\pi(x)/(1-\\pi(x)))$, obtaining pointwise convergence rates that are minimax-optimal up to log factors. By aggregating gradient directions via the outer product $\\mathbb{E}[\\nabla\\ell(X)\\nabla\\ell(X)^T]$ and projecting onto its leading eigenvectors, the central subspace is estimated; the dimension $d$ is selected through cross-validation on misclassification risk. The authors extend the framework to multi-class GLMs and validate the method with extensive simulations and real data, showing improved subspace recovery and predictive performance over existing competitors. This gradient-based, sparsity-aware, nearest-neighbor approach offers a scalable and effective pathway for dimension reduction in high-dimensional binary classification tasks, with strong theoretical guarantees and practical cross-validation-based dimension selection.
Abstract
This paper investigates a new approach to estimate the gradient of the conditional probability given the covariates in the binary classification framework. The proposed approach consists in fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to cope with possibly high-dimensional covariates. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild conditions. Moreover, using an outer product of such gradient estimates at several points in the covariate space, we establish the rate of convergence for estimating the so-called central subspace, a well-known object allowing to carry out dimension reduction within the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.
