Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

Gan Yuan; Mingyue Xu; Samory Kpotufe; Daniel Hsu

Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

Gan Yuan, Mingyue Xu, Samory Kpotufe, Daniel Hsu

Abstract

We consider the problem of sufficient dimension reduction (SDR) for multi-index models. The estimators of the central mean subspace in prior works either have slow (non-parametric) convergence rates, or rely on stringent distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$ being elliptical symmetric). In this paper, we show that a fast parametric convergence rate of form $C_d \cdot n^{-1/2}$ is achievable via estimating the \emph{expected smoothed gradient outer product}, for a general class of distribution $P_{\mathbf{X}}$ admitting Gaussian or heavier distributions. When the link function is a polynomial with a degree of at most $r$ and $P_{\mathbf{X}}$ is the standard Gaussian, we show that the prefactor depends on the ambient dimension $d$ as $C_d \propto d^r$.

Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

Abstract

being elliptical symmetric). In this paper, we show that a fast parametric convergence rate of form

is achievable via estimating the \emph{expected smoothed gradient outer product}, for a general class of distribution

admitting Gaussian or heavier distributions. When the link function is a polynomial with a degree of at most

and

is the standard Gaussian, we show that the prefactor depends on the ambient dimension

Paper Structure (31 sections, 15 theorems, 69 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 31 sections, 15 theorems, 69 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Prior Theoretical Works
Inverse regression methods
Forward regression methods
Detailed Contributions
Outline of the Paper
Preliminaries
Notations
Central Mean Subspace
Main Results
Expected Smoothed Gradient Outer Product
Exhaustiveness of the ESGOP $\overline{\boldsymbol{M}}$
Average Smoothed Gradient Outer Product
Estimation of CMS
Estimating $\boldsymbol{\beta}_h(\boldsymbol{\theta}_j)$'s
...and 16 more sections

Key Result

Proposition 3.2

\newlabelprop:exhaust0 Suppose that assume:basic holds, and that the link function $f$ satisfies $\mathop{\mathrm{\mathbb{E}}}\limits_{\boldsymbol{Z}\sim\mathcal{N}(\boldsymbol{0}_k, h^2 \boldsymbol{I}_k)} [f(\boldsymbol{Z})^2] < \infty$. Then, for $h, \sigma_\theta > 0$, the ESGOP $\overline{\bol

Figures (3)

Figure 1: The subspace estimation error $d(\widehat{\boldsymbol{U}}, \boldsymbol{U})$ v.s. sampling budget $n$. Here, we fixed the number of partitions $m = 15$ and $\sigma_{\theta} = h/\sqrt{20 + 10d}$. We replicate 10 times for each pair of $(n,h)$ and plot the mean (the dots) and the standard error (the error bars) of the estimation error. When $P_{\boldsymbol{X}} =$ standard Gaussian (left), the performance of \ref{['alg:main']} is quite sensitive to the choice of $h$. The optimal choice of $h$ is around the data variance 1. When $h$ gets smaller (e.g., $h=0.5$) or larger (e.g., $h=1.2, 1.5$), we observe larger errors under the same budget level. This actually coincides approximately with the minimizer of $\mu_{\rho}$ in terms of $h$ (c.f. \ref{['prop:var']}). When $P_{\boldsymbol{X}} =$ standard Cauchy, the method is more robust to the choice of $h$.
Figure 1: An example plot of the link function $f$ as defined in \ref{['eqn:save_link']}. Here, we have $\{Y > y\} = [z_1, z_2]$ from the plot $\{x: y \le f(x) \}$, where $z_1 = f_0^{-1}(y)$ and $z_2 = \nu^{-1}(z_1) = \nu^{-1}(f_0^{-1}(y))$.
Figure 2: The subspace estimation error $d(\widehat{\boldsymbol{U}},\boldsymbol{U})$ v.s. the choice of $m$. We replicate 10 times for each $m$, and plot the mean (the dots) and the standard error (the error bars) of the subspace estimation errors. When $m$ is small, the ASGOP $\widetilde{\boldsymbol{M}}$ is not guaranteed to be exhaustive, and only a proper subspace of $\mathcal{U}$ can be recovered. This results in a large subspace estimation error. When $m$ is larger than a certain threshold (c.f., \ref{['cor:exhaust']}), the ASGOP $\widetilde{\boldsymbol{M}}$ is exhaustive with high probability, and the subspace error has an upper-bound that grows at the rate of $O(\sqrt{m})$. The result in the figure matches the reasoning above, as the subspace estimation error drops sharply atk the regime where $m$ is small, and grows gradually for large $m$.

Theorems & Definitions (42)

Definition 2.1: Mean dimension-reduction subspaces and central mean subspace Cook02cms
Definition 2.2: Distance with Optimal Rotation, Adapted from Chen2021spectral
Definition 3.1
Proposition 3.2: Exhaustiveness of $\overline{\boldsymbol{M}}$
Definition 3.3
Corollary 3.4: Exhaustiveness of $\widetilde{\boldsymbol{M}}$
Remark 3.5
Lemma 3.6: Stein's Lemma, chen2011stein
Proposition 3.7
Remark 3.8
...and 32 more

Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

Abstract

Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (42)