Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

Danil Akhtiamov; Reza Ghane; Babak Hassibi

Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

Danil Akhtiamov, Reza Ghane, Babak Hassibi

TL;DR

It is proved that SMD with matrix mirror functions $\psi(\cdot)$ converges exponentially to a global interpolator and generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by $\psi(\cdot$ from initialization subject to interpolating the data.

Abstract

We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove that SMD with matrix mirror functions $ψ(\cdot)$ converges exponentially to a global interpolator. Furthermore, we generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by $ψ(\cdot)$ from initialization subject to interpolating the data. These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.

Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

TL;DR

It is proved that SMD with matrix mirror functions

converges exponentially to a global interpolator and generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by

from initialization subject to interpolating the data.

Abstract

converges exponentially to a global interpolator. Furthermore, we generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by

from initialization subject to interpolating the data. These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.

Paper Structure (14 sections, 6 theorems, 44 equations, 1 figure)

This paper contains 14 sections, 6 theorems, 44 equations, 1 figure.

Introduction
Notation and Problem Formulation
The problem
Optimization Framework
Mathematical Preliminaries
Main Results and Applications
Proofs
Proof of Convergence
Implicit Bias
Convergence Rate
Experimental Setup
Methods
Results
Conclusion

Key Result

Theorem 1

Assume that the linear operator $\mathcal{A}: \mathbb{R}^{d \times k} \to \mathbb{R}^p$, the mirror $\psi: \mathbb{R}^{d \times k} \to \mathbb{R}$ and the training losses $\mathcal{L}_t: \mathbb{R}^{d \times k} \to \mathbb{R}$ satisfy assumptions 1-4 from the list of Assumptions ass: main, whose not Denote the $t$-th iteration of the SMD algorithm defined via eq: L_t with mirror $\psi$ trained to

Figures (1)

Figure 1: Relative recovery error versus sampling probability for SVT cai2010singular, Soft-Impute mazumder2010spectral, and Schatten-$p$ SMD.

Theorems & Definitions (25)

Definition 1: Linear Constraint System
Example 1: Matrix Completion
Example 2: Multi-class Linear Classification
Definition 2: Training Objective
Definition 3: Matrix Stochastic Mirror Descent
Definition 4: Matrix Convexity Properties
Definition 5: Matrix Bregman Divergence
Definition 6: Schatten Norm
Theorem 1: Convergence Rate and the Implicit Bias
Remark 1
...and 15 more

Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

TL;DR

Abstract

Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (25)