Exact variable selection in sparse nonparametric models

Natalia Stepanova; Marie Turcicova

Exact variable selection in sparse nonparametric models

Natalia Stepanova, Marie Turcicova

TL;DR

An adaptive selection procedure is introduced that identifies exactly all nonzero $k$-variate components of $f$ in the asymptotically minimax sense with respect to the Hamming risk.

Abstract

We study the problem of adaptive variable selection in a Gaussian white noise model of intensity $\varepsilon$ under certain sparsity and regularity conditions on an unknown regression function $f$. The $d$-variate regression function $f$ is assumed to be a sum of functions each depending on a smaller number $k$ of variables ($1 \leq k \leq d$). These functions are unknown to us and only few of them are nonzero. We assume that $d=d_\varepsilon \to \infty$ as $\varepsilon \to 0$ and consider the cases when $k$ is fixed and when $k=k_\varepsilon \to \infty$, $k=o(d)$ as $\varepsilon \to 0$. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all nonzero $k$-variate components of $f$. In addition, we establish conditions under which exact identification of the nonzero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.

Exact variable selection in sparse nonparametric models

TL;DR

An adaptive selection procedure is introduced that identifies exactly all nonzero

-variate components of

in the asymptotically minimax sense with respect to the Hamming risk.

Abstract

We study the problem of adaptive variable selection in a Gaussian white noise model of intensity

under certain sparsity and regularity conditions on an unknown regression function

. The

-variate regression function

is assumed to be a sum of functions each depending on a smaller number

of variables (

). These functions are unknown to us and only few of them are nonzero. We assume that

and consider the cases when

is fixed and when

. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all nonzero

-variate components of

. In addition, we establish conditions under which exact identification of the nonzero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.

Paper Structure (10 sections, 6 theorems, 177 equations, 1 figure, 1 table)

This paper contains 10 sections, 6 theorems, 177 equations, 1 figure, 1 table.

Introduction
Regularity conditions
Problem statement
Notation
Construction of an exact selector
Main results
The case of fixed $k$
The case of growing $k$
Simulation study
Proof of Theorems

Key Result

Theorem 1

Let $k\in\{1,\ldots,d\}$, $\beta\in(0,1)$, and $\sigma>0$ be fixed numbers, and let $d=d_\varepsilon\to \infty$ and $\log {d\choose k}= {{\mathcal{O}}} \left( \log \varepsilon^{-1} \right)$ as $\varepsilon\to 0$. Let the quantity $r_{\varepsilon,k}>0$ be such that Then the selector $\boldsymbol{\hat{\eta}}=(\hat{\eta}_u)_{u \in {\cal U}_{k,d} }$ given by (def:selector_for_beta_unknown) satisfies

Figures (1)

Figure 1: Bivariate components $f_u$ of the regression function $f$ defined by (\ref{['def:functions_fu']}).

Theorems & Definitions (10)

Remark 1
Theorem 1
Theorem 2
Remark 2
Remark 3
Theorem 3
Theorem 4
Remark 4
Lemma 5
Lemma 6

Exact variable selection in sparse nonparametric models

TL;DR

Abstract

Exact variable selection in sparse nonparametric models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (10)