Table of Contents
Fetching ...

Exact variable selection in sparse nonparametric models

Natalia Stepanova, Marie Turcicova

TL;DR

An adaptive selection procedure is introduced that identifies exactly all nonzero $k$-variate components of $f$ in the asymptotically minimax sense with respect to the Hamming risk.

Abstract

We study the problem of adaptive variable selection in a Gaussian white noise model of intensity $\varepsilon$ under certain sparsity and regularity conditions on an unknown regression function $f$. The $d$-variate regression function $f$ is assumed to be a sum of functions each depending on a smaller number $k$ of variables ($1 \leq k \leq d$). These functions are unknown to us and only few of them are nonzero. We assume that $d=d_\varepsilon \to \infty$ as $\varepsilon \to 0$ and consider the cases when $k$ is fixed and when $k=k_\varepsilon \to \infty$, $k=o(d)$ as $\varepsilon \to 0$. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all nonzero $k$-variate components of $f$. In addition, we establish conditions under which exact identification of the nonzero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.

Exact variable selection in sparse nonparametric models

TL;DR

An adaptive selection procedure is introduced that identifies exactly all nonzero -variate components of in the asymptotically minimax sense with respect to the Hamming risk.

Abstract

We study the problem of adaptive variable selection in a Gaussian white noise model of intensity under certain sparsity and regularity conditions on an unknown regression function . The -variate regression function is assumed to be a sum of functions each depending on a smaller number of variables (). These functions are unknown to us and only few of them are nonzero. We assume that as and consider the cases when is fixed and when , as . In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all nonzero -variate components of . In addition, we establish conditions under which exact identification of the nonzero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.
Paper Structure (10 sections, 6 theorems, 177 equations, 1 figure, 1 table)

This paper contains 10 sections, 6 theorems, 177 equations, 1 figure, 1 table.

Key Result

Theorem 1

Let $k\in\{1,\ldots,d\}$, $\beta\in(0,1)$, and $\sigma>0$ be fixed numbers, and let $d=d_\varepsilon\to \infty$ and $\log {d\choose k}= {{\mathcal{O}}} \left( \log \varepsilon^{-1} \right)$ as $\varepsilon\to 0$. Let the quantity $r_{\varepsilon,k}>0$ be such that Then the selector $\boldsymbol{\hat{\eta}}=(\hat{\eta}_u)_{u \in {\cal U}_{k,d} }$ given by (def:selector_for_beta_unknown) satisfies

Figures (1)

  • Figure 1: Bivariate components $f_u$ of the regression function $f$ defined by (\ref{['def:functions_fu']}).

Theorems & Definitions (10)

  • Remark 1
  • Theorem 1
  • Theorem 2
  • Remark 2
  • Remark 3
  • Theorem 3
  • Theorem 4
  • Remark 4
  • Lemma 5
  • Lemma 6