Table of Contents
Fetching ...

Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent

Guillaume Braun, Minh Ha Quang, Masaaki Imaizumi

TL;DR

Analysis of the learning dynamics of vanilla SGD under the SIM with anisotropic input data shows that vanilla SGD automatically adapts to the data's covariance structure, and derives upper and lower bounds on the sample complexity using a notion of effective dimension determined by the structure of the covariance matrix instead of the input data dimension.

Abstract

We investigate the problem of learning a Single Index Model (SIM)- a popular model for studying the ability of neural networks to learn features - from anisotropic Gaussian inputs by training a neuron using vanilla Stochastic Gradient Descent (SGD). While the isotropic case has been extensively studied, the anisotropic case has received less attention and the impact of the covariance matrix on the learning dynamics remains unclear. For instance, Mousavi-Hosseini et al. (2023b) proposed a spherical SGD that requires a separate estimation of the data covariance matrix, thereby oversimplifying the influence of covariance. In this study, we analyze the learning dynamics of vanilla SGD under the SIM with anisotropic input data, demonstrating that vanilla SGD automatically adapts to the data's covariance structure. Leveraging these results, we derive upper and lower bounds on the sample complexity using a notion of effective dimension that is determined by the structure of the covariance matrix instead of the input data dimension.

Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent

TL;DR

Analysis of the learning dynamics of vanilla SGD under the SIM with anisotropic input data shows that vanilla SGD automatically adapts to the data's covariance structure, and derives upper and lower bounds on the sample complexity using a notion of effective dimension determined by the structure of the covariance matrix instead of the input data dimension.

Abstract

We investigate the problem of learning a Single Index Model (SIM)- a popular model for studying the ability of neural networks to learn features - from anisotropic Gaussian inputs by training a neuron using vanilla Stochastic Gradient Descent (SGD). While the isotropic case has been extensively studied, the anisotropic case has received less attention and the impact of the covariance matrix on the learning dynamics remains unclear. For instance, Mousavi-Hosseini et al. (2023b) proposed a spherical SGD that requires a separate estimation of the data covariance matrix, thereby oversimplifying the influence of covariance. In this study, we analyze the learning dynamics of vanilla SGD under the SIM with anisotropic input data, demonstrating that vanilla SGD automatically adapts to the data's covariance structure. Leveraging these results, we derive upper and lower bounds on the sample complexity using a notion of effective dimension that is determined by the structure of the covariance matrix instead of the input data dimension.

Paper Structure

This paper contains 41 sections, 17 theorems, 76 equations, 4 figures.

Key Result

Theorem 1

Assume that Assumptions ass:q, ass:ie, ass:init and ass:coeff hold and the initialization scaling $r$ is such that $\left\|{Q^{1/2}w^{(0)}}\right\|=c_r\left\|{Q^{1/2}w^*}\right\|$ for some constant $c_r\in (0,1]$.

Figures (4)

  • Figure 1: Comparison of learning dynamics in isotropic and anisotropic settings.
  • Figure 2: Comparison of learning dynamics between vanilla SGD and spherical SGD.
  • Figure 3: Comparison of learning dynamics between $\texttt{Vanilla SGD}$ and $\texttt{RepSGD}$.
  • Figure 4: Comparison of learning dynamics between $\texttt{Vanilla SGD}$ and $\texttt{RepSGD}$.

Theorems & Definitions (38)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • Remark 4
  • Remark 5
  • Theorem 2: CSQ lower-bound
  • proof
  • Remark 6
  • Remark 7
  • ...and 28 more