Table of Contents
Fetching ...

Pseudo-Maximum Likelihood Theory for High-Dimensional Rank One Inference

Curtis Grant, Aukosh Jagannath, Justin Ko

TL;DR

The paper develops a universal pseudo-maximum likelihood theory for high-dimensional rank-one inference, introducing four information parameters that govern the limiting behavior across a wide class of models. It establishes a Parisi-variational framework that characterizes the limiting pseudo-likelihood and, via Gaussian equivalence, the performance of PMLEs, including a score-corrected variant to handle ill-scored models. The work shows strong and coarse equivalence between many estimation tasks (e.g., spiked matrix models and stochastic block models) and provides a complete description of the least-squares estimator in these settings, including phase transitions and failure modes. It applies the theory to Gaussian pseudolikelihoods, non-linear transforms of rank-one matrices, and a spectrum of examples (SBM, spiked Wigner, sparse PCA, Poisson-Bernoulli, etc.), offering a unifying lens for understanding statistical limits and algorithmic feasibility in high-dimensional rank-one inference.

Abstract

We develop a pseudo-likelihood theory for rank one matrix estimation problems in the high dimensional limit. We prove a variational principle for the limiting pseudo-maximum likelihood which also characterizes the performance of the corresponding pseudo-maximum likelihood estimator. We show that this variational principle is universal and depends only on four parameters determined by the corresponding null model. Through this universality, we introduce a notion of equivalence for estimation problems of this type and, in particular, show that a broad class of estimation tasks, including community detection, sparse submatrix detection, and non-linear spiked matrix models, are equivalent to spiked matrix models. As an application, we obtain a complete description of the performance of the least-squares (or ``best rank one'') estimator for any rank one matrix estimation problem.

Pseudo-Maximum Likelihood Theory for High-Dimensional Rank One Inference

TL;DR

The paper develops a universal pseudo-maximum likelihood theory for high-dimensional rank-one inference, introducing four information parameters that govern the limiting behavior across a wide class of models. It establishes a Parisi-variational framework that characterizes the limiting pseudo-likelihood and, via Gaussian equivalence, the performance of PMLEs, including a score-corrected variant to handle ill-scored models. The work shows strong and coarse equivalence between many estimation tasks (e.g., spiked matrix models and stochastic block models) and provides a complete description of the least-squares estimator in these settings, including phase transitions and failure modes. It applies the theory to Gaussian pseudolikelihoods, non-linear transforms of rank-one matrices, and a spectrum of examples (SBM, spiked Wigner, sparse PCA, Poisson-Bernoulli, etc.), offering a unifying lens for understanding statistical limits and algorithmic feasibility in high-dimensional rank-one inference.

Abstract

We develop a pseudo-likelihood theory for rank one matrix estimation problems in the high dimensional limit. We prove a variational principle for the limiting pseudo-maximum likelihood which also characterizes the performance of the corresponding pseudo-maximum likelihood estimator. We show that this variational principle is universal and depends only on four parameters determined by the corresponding null model. Through this universality, we introduce a notion of equivalence for estimation problems of this type and, in particular, show that a broad class of estimation tasks, including community detection, sparse submatrix detection, and non-linear spiked matrix models, are equivalent to spiked matrix models. As an application, we obtain a complete description of the performance of the least-squares (or ``best rank one'') estimator for any rank one matrix estimation problem.

Paper Structure

This paper contains 46 sections, 47 theorems, 327 equations, 2 figures, 1 table.

Key Result

Theorem 2.1

Suppose that $\boldsymbol{\mathbf{x}}^{0,N}$ is tame and that the pair $(g_0,g)$ is well-scored with information parameters $\bar{\beta}$. For every $(S,M)\in \mathbb{R}^2$, we have that Here $\psi_{\bar{\beta}}$ is an explicit, Holder continuous function given by eq:def-of-psi below, depending only on $\bar{\beta},\Omega,$ and $\mathop{\mathrm{\mathbb{Q}}}\nolimits$.

Figures (2)

  • Figure 1: The cosine similarity in the spiked matrix problem with Rademacher latent variable and noise with mean $1$ solved using corrected and uncorrected least squares. A data matrix of size $2500 \times 2500$ and the uncorrected and corrected likelihoods were optimized using gradient descent. The left plot displays the cosine similarity against the true signal for the corrected and non-corrected estimators plotted against the number of steps. It is clear that correcting the likelihood gets rid of the effect from the score parameter, and the corresponding pseudo-maximum likelihood estimator achieves a non-zero cosine similarity. The right plot shows the correlation with the spurious all one's vector. The uncorrected model rapidly correlates with the spurious vector while the corrected model decorrelates with it.
  • Figure 2: The cosine similarity in the sparse Rademacher problem for corrected and uncorrected least squares, and the log-likelihood. A data matrix of size $2500 \times 2500$ was used, and corrected and uncorrected least squares was performed on the data matrix and optimized using gradient descent. The plot displays the cosine similarity plotted against the number of steps. It is clear that the least squares estimator was uninformative and always achieved a cosine-similarity of zero. However, when gradient descent was performed on the log-likelihood, we see that maximum likelihood estimator achieved non-trivial performance. This demonstrates the failure of least squares in some problems that can be solvable by MLE.

Theorems & Definitions (101)

  • Definition 2.1
  • Remark 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4: well-scored pseudo-likelihood
  • Theorem 2.1
  • Theorem 2.2
  • Remark 2.2
  • Corollary 2.1
  • Example 2.1
  • ...and 91 more