Table of Contents
Fetching ...

Analysing heavy-tail properties of Stochastic Gradient Descent by means of Stochastic Recurrence Equations

Ewa Damek, Sebastian Mentemeier

TL;DR

The problem is put into the right framework by applying the theory of irreducible-proximal matrices to solve the heavy tail properties of stochastic gradient descent in linear regression.

Abstract

In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, Gürbüzbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_k=A_k X_{k-1}+B_k$, for independent and identically distributed pairs $(A_k, B_k)$, where $A_k$ is a random symmetric matrix and $B_k$ is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.

Analysing heavy-tail properties of Stochastic Gradient Descent by means of Stochastic Recurrence Equations

TL;DR

The problem is put into the right framework by applying the theory of irreducible-proximal matrices to solve the heavy tail properties of stochastic gradient descent in linear regression.

Abstract

In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, Gürbüzbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion , for independent and identically distributed pairs , where is a random symmetric matrix and is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.
Paper Structure (14 sections, 14 theorems, 111 equations, 2 figures)

This paper contains 14 sections, 14 theorems, 111 equations, 2 figures.

Key Result

Proposition 2.1

Assume that $\mu_A$ satisfies (i-p-nc) and let $s \in I_k$. Then the following holds. The spectral radii $\rho(\mathop{\mathrm{P^s}}\nolimits)$ and $\rho(\mathop{\mathrm{P^s_*}}\nolimits)$ both equal $k(s)$ and there is a unique probability measure $\nu_{s}$ on $S$ and a unique function $r_{s}\in \m Further, the function $r_{s}$ is strictly positive. Also, there is a unique probability measure $\n

Figures (2)

  • Figure 1: Contour plot of $h$ as a function of $b$ and $s$, for model \ref{['Rank1Gauss']} with $d=2$ and $\eta=0.75$. The black line is the contour of $k \equiv1$. The values of $h$ have been cutted at level 2 for a better visualization.
  • Figure 2: Contour plot of $h$ as a function of $\eta$ and $s$, for model \ref{['Rank1Gauss']} with $d=2$ and $b=5$. The black line is the contour of $k\equiv1$. The values of $h$ have been cutted at level 2 for a better visualization.

Theorems & Definitions (36)

  • Proposition 2.1
  • proof : Source
  • Proposition 2.2
  • proof : Source
  • Remark 2.3
  • Theorem 3.1
  • proof : Source
  • Remark 3.2
  • Theorem 3.3
  • Remark 3.4
  • ...and 26 more