Table of Contents
Fetching ...

Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

Antoine Maillard, Tony Bonnaire, Giulio Biroli

TL;DR

This work considers the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models) and shows that some variational formulas previously established in the literature can be drastically simplified, reducing to explicit variational problems over a finite number of scalar parameters that can efficiently solve numerically.

Abstract

We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $\boldsymbolθ^\star \in \mathbb{R}^d$ (where $d \gg 1$) from a loss function $\hat{R}(\boldsymbolθ)$ that depends on pairs of labels $(\mathbf{x}_i \cdot \boldsymbolθ, \mathbf{x}_i \cdot \boldsymbolθ^\star)_{i=1}^n$, with $\mathbf{x}_i \sim \mathcal{N}(0, I_d)$, in the proportional asymptotic regime $n \asymp d$. Using the Kac-Rice formula, we analyze different complexities of the landscape -- defined as the expected number of critical points -- corresponding to various types of critical points, including local minima. We first show that some variational formulas previously established in the literature for these complexities can be drastically simplified, reducing to explicit variational problems over a finite number of scalar parameters that we can efficiently solve numerically. Our framework also provides detailed predictions for properties of the critical points, including the spectral properties of the Hessian and the joint distribution of labels. We apply our analysis to the real phase retrieval problem for which we derive complete topological phase diagrams of the loss landscape, characterizing notably BBP-type transitions where the Hessian at local minima (as predicted by the Kac-Rice formula) becomes unstable in the direction of the signal. We test the predictive power of our analysis to characterize gradient flow dynamics, finding excellent agreement with finite-size simulations of local optimization algorithms, and capturing fine-grained details such as the empirical distribution of labels. Overall, our results open new avenues for the asymptotic study of loss landscapes and topological trivialization phenomena in high-dimensional statistical models.

Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval

TL;DR

This work considers the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models) and shows that some variational formulas previously established in the literature can be drastically simplified, reducing to explicit variational problems over a finite number of scalar parameters that can efficiently solve numerically.

Abstract

We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal (where ) from a loss function that depends on pairs of labels , with , in the proportional asymptotic regime . Using the Kac-Rice formula, we analyze different complexities of the landscape -- defined as the expected number of critical points -- corresponding to various types of critical points, including local minima. We first show that some variational formulas previously established in the literature for these complexities can be drastically simplified, reducing to explicit variational problems over a finite number of scalar parameters that we can efficiently solve numerically. Our framework also provides detailed predictions for properties of the critical points, including the spectral properties of the Hessian and the joint distribution of labels. We apply our analysis to the real phase retrieval problem for which we derive complete topological phase diagrams of the loss landscape, characterizing notably BBP-type transitions where the Hessian at local minima (as predicted by the Kac-Rice formula) becomes unstable in the direction of the signal. We test the predictive power of our analysis to characterize gradient flow dynamics, finding excellent agreement with finite-size simulations of local optimization algorithms, and capturing fine-grained details such as the empirical distribution of labels. Overall, our results open new avenues for the asymptotic study of loss landscapes and topological trivialization phenomena in high-dimensional statistical models.
Paper Structure (41 sections, 77 equations, 14 figures, 1 algorithm)

This paper contains 41 sections, 77 equations, 14 figures, 1 algorithm.

Figures (14)

  • Figure 1: In the phase retrieval problem (see eq. \ref{['eq:def_ell_a']}), with $a = 0.01$, $q = 0.0$, and $\alpha = 6.5$, we show the annealed Kac-Rice prediction for the Hessian density of typical-energy minima, see eq. \ref{['eq:rho0']}. The predicted complexity is ${\widetilde{\Sigma}}_0(q) \simeq 7.10^{-3} > 0$ (see also Fig. \ref{['fig:phase_diagram']} below). The green arrow represents the position of the negative "BBP" outlier in the spectrum predicted by our theory for this Hessian density, see eq. \ref{['eq:outlier']}. The red-shaded area delimitates the prediction for the location of the outlier for all minima with positive complexity, from the highest-energy to lowest-energy ones. The histograms correspond to numerical results obtained using finite-size gradient descent simulations, see Section \ref{['subsec:numerical_setting']} for more details.
  • Figure 2: For $a = 0.01$, $q = 0$, and three values $\alpha = 6.0 < \alpha_{\mathrm{triv.}}$, $\alpha = 7.5 \simeq \alpha_{\mathrm{triv.}}$ and $\alpha = 8.0 > \alpha_{\mathrm{triv.}}$, we show: (left) the complexity ${\widetilde{\Sigma}}_0(q=0, e)$ for different values of $e$ around the maximal complexity, (middle) the density of the Hessian at local minima of typical energy (at $e_\star \coloneqq \mathop{\mathrm{arg\,max}}\limits {\widetilde{\Sigma}}_0(q = 0, e)$), and (right) the corresponding law $\nu(y, y^\star)$. The results are obtained by solving the optimization problem of eq. \ref{['eq:Sigma_0_scalar']}.
  • Figure 3: For $a = 0.01$, $q = 0.4$, and three values $\alpha = 4.0 < \alpha_{\mathrm{triv.}}$, $\alpha = 4.55 \simeq \alpha_{\mathrm{triv.}}$ and $\alpha = 5.0 > \alpha_{\mathrm{triv.}}$, we show: (left) the complexity $\Sigma_\mathrm{tot.}(q, e)$ for different values of $e$ around the maximal complexity, (middle) the density of the Hessian at critical points of typical energy (at $e_\star \coloneqq \mathop{\mathrm{arg\,max}}\limits \Sigma_\mathrm{tot.}(q, e)$), and (right) the corresponding law $\nu(y, y^\star)$. The results are obtained by solving the optimization problem of eq. \ref{['eq:Sigma_TC_scalar']}.
  • Figure 4: For $a = 0.01$, $q = 0.0$, we show (left) the energy of the typical, lowest, and highest-energy minima as a function of $\alpha$. The inset shows the value of the complexity $\tilde{\Sigma}_0$ (notice that highest and lowest-energy minima are always at zero complexity by definition). In the red region the complexity of local minima is negative. (Right) The spectral density $\rho_0(w)$ of the Hessian at these minima, for $\alpha = 6.0$. We show in dotted red the line $w = 0$. In the inset, we show the Lagrange multiplier $\lambda_\star$ as a function of $\alpha$. The results are obtained by solving the Kac-Rice formula of eq. \ref{['eq:Sigma_0_scalar']}.
  • Figure 5: Phase diagram predicted by the Kac-Rice method, for $a = 0.01$.
  • ...and 9 more figures