Table of Contents
Fetching ...

Logistic lasso regression with nearest neighbors for gradient-based dimension reduction

Touqeer Ahmad, François Portier, Gilles Stupfler

TL;DR

Let $\\pi(x)=P(Y=1|X=x)=g(\\beta^T x)$ with a $d$-dimensional central subspace spanned by the columns of $\\beta$. The paper develops a localized nearest-neighbor penalized logistic approach to estimate the gradient $\\nabla\\ell(x)$ of $\\ell(x)=\\log(\\pi(x)/(1-\\pi(x)))$, obtaining pointwise convergence rates that are minimax-optimal up to log factors. By aggregating gradient directions via the outer product $\\mathbb{E}[\\nabla\\ell(X)\\nabla\\ell(X)^T]$ and projecting onto its leading eigenvectors, the central subspace is estimated; the dimension $d$ is selected through cross-validation on misclassification risk. The authors extend the framework to multi-class GLMs and validate the method with extensive simulations and real data, showing improved subspace recovery and predictive performance over existing competitors. This gradient-based, sparsity-aware, nearest-neighbor approach offers a scalable and effective pathway for dimension reduction in high-dimensional binary classification tasks, with strong theoretical guarantees and practical cross-validation-based dimension selection.

Abstract

This paper investigates a new approach to estimate the gradient of the conditional probability given the covariates in the binary classification framework. The proposed approach consists in fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to cope with possibly high-dimensional covariates. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild conditions. Moreover, using an outer product of such gradient estimates at several points in the covariate space, we establish the rate of convergence for estimating the so-called central subspace, a well-known object allowing to carry out dimension reduction within the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.

Logistic lasso regression with nearest neighbors for gradient-based dimension reduction

TL;DR

Let with a -dimensional central subspace spanned by the columns of . The paper develops a localized nearest-neighbor penalized logistic approach to estimate the gradient of , obtaining pointwise convergence rates that are minimax-optimal up to log factors. By aggregating gradient directions via the outer product and projecting onto its leading eigenvectors, the central subspace is estimated; the dimension is selected through cross-validation on misclassification risk. The authors extend the framework to multi-class GLMs and validate the method with extensive simulations and real data, showing improved subspace recovery and predictive performance over existing competitors. This gradient-based, sparsity-aware, nearest-neighbor approach offers a scalable and effective pathway for dimension reduction in high-dimensional binary classification tasks, with strong theoretical guarantees and practical cross-validation-based dimension selection.

Abstract

This paper investigates a new approach to estimate the gradient of the conditional probability given the covariates in the binary classification framework. The proposed approach consists in fitting a localized nearest-neighbor logistic model with -penalty in order to cope with possibly high-dimensional covariates. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild conditions. Moreover, using an outer product of such gradient estimates at several points in the covariate space, we establish the rate of convergence for estimating the so-called central subspace, a well-known object allowing to carry out dimension reduction within the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.
Paper Structure (21 sections, 11 theorems, 99 equations, 14 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 11 theorems, 99 equations, 14 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Suppose that cond:new1 and cond:new2 are fulfilled. If $k:=k_n \to \infty$ and $\lambda : = \lambda_n$ are such that $k^{1+p/2}/n\to\infty$, $k^{1+p/4}/n$ is bounded and $n \lambda^p / k^{1+p/2} \to c\in [0,\infty)$ then we have with $W_n(x) \stackrel{\mathrm{d}}{\longrightarrow} \mathcal{N}(0,\Gamma(x))$ and

Figures (14)

  • Figure 1: Simulation study -- Distance to the central subspace (left, on the log scale) and misclassification risk (right) in Example 1 (top) and Example 2 (bottom), averaged over $N=1000$ replications in each situation. The covariate has dimension $p=8$ and the dimension $d$ is chosen as the dimension of the correct population central subspace (i.e.$d=1$ in Example 1 and $d=2$ in Example 2). In the right-hand panels, the misclassification risk represented relates to the nearest neighbor majority vote classifier using the set of projected covariates on the estimated subspace produced by each method; in addition, "Full" denotes this classifier on the full, non-projected set of covariates, and "Oracle" denotes this classifier using the covariates projected on the correct population central subspace.
  • Figure 2: Simulation study -- Distance to the central subspace (left) and misclassification risk (right) in Example 3 (top) and Example 4 (bottom), averaged over $N=1000$ replications in each situation. The covariate has dimension $p=8$ and the dimension $d$ is chosen as the dimension of the correct population central subspace (i.e.$d=2$ in Example 3 and $d=3$ in Example 4). In the right-hand panels, the misclassification risk represented relates to the nearest neighbor majority vote classifier using the set of projected covariates on the estimated subspace produced by each method; in addition, "Full" denotes this classifier on the full, non-projected set of covariates, and "Oracle" denotes this classifier using the covariates projected on the correct population central subspace.
  • Figure 3: Simulation study -- Distance to the central subspace (left) and misclassification risk (right) in Example 4, averaged over $N=1000$ replications of a sample of size $n=1000$, as a function of the dimension $d\in \{1,\ldots,6\}$ of the estimated central subspace and $p\in \{8,16,32,64\}$ of the full covariate space. In the right-hand panels, the red dashed line corresponds to the nearest-neighbor classifier with $d=p$, and the blue dashed line corresponds to this classifier using the covariates projected on the correct population central subspace.
  • Figure 4: Simulation study -- Left panel: Dimension selection through Algorithm \ref{['alg:feature']}, where the number indicated above the curve gives the number of times the dimension selected in the (absolute or relative) majority of cases was chosen. Right panel: Misclassification risk of the nearest-neighbor classifier with, from left to right, $d=p$ (red bar), the covariates projected on the correct population central subspace (blue bar), the central subspace estimated using the non-penalized LLO($\lambda=0$) method under correct specification of the dimension (green bar) and with the dimension estimated by cross-validation (purple bar), and the central subspace estimated using the penalized LLO($\lambda>0$) method under correct specification of the dimension (orange bar) and with the dimension estimated by cross-validation (yellow bar). Both panels are produced using $N=1000$ independent replications of a sample of size $n=1000$ and dimensions $p\in \{8,16,32,64\}$ of the full covariate space are considered.
  • Figure 5: Real data analysis - Top panels: Dimension selection through cross-validation for LLO($\lambda=0$) and LLO($\lambda>0$), using the nearest neighbor classifier. Bottom panels: Estimated misclassification risk related to the nearest neighbor classifier on the set of projected covariates on the estimated subspace produced by each method for a dimension $d$ of the central subspace in $\{2, 3, 4, 5, 10\}$, with 95% asymptotic Gaussian Wald-type confidence intervals (the sample size used is the size of the testing set).
  • ...and 9 more figures

Theorems & Definitions (17)

  • Theorem 1: Convergence of nearest-neighbor penalized local logistic regression estimators
  • Corollary 2: Rate of convergence of the gradient estimator
  • Corollary 3
  • Lemma 1
  • proof : Proof of Lemma \ref{['lem:pollard']}
  • Theorem 4: Central limit theorem for nearest-neighbor estimators
  • Lemma 2: portier2021nearest
  • Lemma 3: Tightness and weak convergence of $Z_n$
  • proof : Proof of Lemma \ref{['lem:tightness']}
  • proof : Proof of Theorem \ref{['theo:tightness']}
  • ...and 7 more