Asymptotic Dynamics of Alternating Minimization for Bilinear Regression

Koki Okajima; Takashi Takahashi

Asymptotic Dynamics of Alternating Minimization for Bilinear Regression

Koki Okajima, Takashi Takahashi

TL;DR

The paper addresses how alternating minimization behaves for bilinear regression in the high-dimensional proportional regime. It develops a chain-of-replicas, multi-temperature replica analysis that yields a two-dimensional Gaussian effective dynamics with memory, providing a closed-form quenched-average description of AM over iterations. A key finding is that, for finite $\kappa$ and finite iterations starting from random initialization ($m_0=0$), retrieval of the targets is impossible, a prediction borne out by finite-size simulations; memory effects are pronounced early on and become short-ranged later. The framework offers a general tool for analyzing iterative algorithms under random designs and can extend to online settings and other loss functions, with implications for initialization strategies and potential algorithmic phase transitions at critical sample complexities.

Abstract

This study investigates the dynamics of alternating minimization applied to a bilinear regression task with normally distributed covariates, under the asymptotic system size limit where the number of parameters and observations diverge at the same rate. This is achieved by employing the replica method to a multi-temperature glassy system which unfolds the algorithm's time evolution. Our results show that the dynamics can be described effectively by a two-dimensional discrete stochastic process, where each step depends on all previous time steps, revealing the structure of the memory dependence in the evolution of alternating minimization. The theoretical framework developed in this work can be applied to the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.

Asymptotic Dynamics of Alternating Minimization for Bilinear Regression

TL;DR

and finite iterations starting from random initialization (

), retrieval of the targets is impossible, a prediction borne out by finite-size simulations; memory effects are pronounced early on and become short-ranged later. The framework offers a general tool for analyzing iterative algorithms under random designs and can extend to online settings and other loss functions, with implications for initialization strategies and potential algorithmic phase transitions at critical sample complexities.

Abstract

Paper Structure (24 sections, 86 equations, 6 figures)

This paper contains 24 sections, 86 equations, 6 figures.

Introduction
Summary of main results.
The model
Replica analysis for alternating minimization
Alternating minimization as a stochastic process
Outline of the derivation
Average generating function and saddle point equation
Expression for $\mathcal{S}_v^t, \mathcal{S}_u^t$.
Expression for $\mathcal{E}_v^t$, $\mathcal{E}_u^t$.
Relation to the online setup.
Characterization of the dynamics of alternating minimization
Generic factorized priors on the target vectors.
Impossible retrieval from random initialization
Numerical comparison with finite size experiments
Time evolution of the product cosine similarity
...and 9 more sections

Figures (6)

Figure 1: Comparison of the theoretical value (solid line) of $m^t$ and the empirical value (markers) obtained from experiments for $N = 16000$. The theoretical value was obtained by solving the fixed-point equations given in \ref{['eq:u_saddlepoint']} and \ref{['eq:v_saddlepoint']}. The empirical value was obtained by taking the mean over 64 random configurations of $\mathcal{D}$. Error bars represent the standard error of the mean.
Figure 2: Detailed dynamics of $m^t$ for $m_0 = 0.15$ (top) and $m_0 = 0.30$ (bottom) for various values of $\kappa$. The thin green lines correspond to all 64 independent runs of AM with system size $N = 16000$. We see that the variance of $m^t$ is large for small $\kappa$ and $m_0$, with both mean and median of the population of trajectories deviating from the theoretical value.
Figure 3: Values of $\delta m^2 (m_0, \kappa, N)$ (upper panel) and its normalized counterpart $\delta_{\rm norm}^2 (m_0, \kappa, N)$ (lower panel) for $m_0 = 0.15$ (left), $0.30$ (middle) and $0.60$ (right) as a function of $\kappa$ for various values of $N$. The average over $\mathcal{D}$ was taken over 1024, 256, 256, 64 and 64 random configurations for $N =$ 1000, 2000, 4000, 8000 and 16000 respectively. Error bars represent the standard error of the mean.
Figure 4: Integral of $\delta m^2(m_0, \kappa, N)$ over $\kappa$ for $m_0 = 0.15, 0.30$ and $0.60$ as a function of $N$ in normal scale (left) and log-log scale (right). The integral, calculated using the trapezoidal rule, was taken over the region displayed in figure \ref{['fig:Delta_M']}. Error bars represent the standard error of the mean, which are too small to be visible.
Figure 5: Comparison of the empirical distribution of $\bm{u}^t$ and its theoretical counterpart $\mathsf{u}^t$ (left), and the empirical distribution of $\bm{v}^t$ and $\mathsf{v}^t$ (right) for $t = 1, 3, 7$ and $m_0 = 0.30, \kappa = 5.0$. The empirical distribution was obtained from a single random instance of size $N= 16000$.
...and 1 more figures

Theorems & Definitions (1)

Claim 1

Asymptotic Dynamics of Alternating Minimization for Bilinear Regression

TL;DR

Abstract

Asymptotic Dynamics of Alternating Minimization for Bilinear Regression

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)