Table of Contents
Fetching ...

Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

Gavin Zhang, Hong-Ming Chiu, Richard Y. Zhang

TL;DR

The paper tackles low-rank matrix recovery from noisy measurements using a preconditioned non-convex gradient descent approach. It introduces iterations that apply a right preconditioner with a geometrically decaying regularization parameter $\eta$ and proves local linear convergence to minimax error for symmetric matrix sensing under RIP, with a noise-driven error floor $\mathcal{E}_{opt}=\frac{\sigma^{2}nr\log n}{m}$. A key insight is a coupling between $\eta_t$ and the current error, enabled by a change of norm that preserves gradient dominance in the presence of noise, yielding convergence rates independent of ill-conditioning and over-parameterization. The method demonstrates strong empirical performance on Gaussian matrix sensing and a 60 MP ultrafast ultrasound denoising task, outperforming prior preconditioning schemes and achieving effective noise suppression with practical iteration counts.

Abstract

Non-convex gradient descent is a common approach for estimating a low-rank $n\times n$ ground truth matrix from noisy measurements, because it has per-iteration costs as low as $O(n)$ time, and is in theory capable of converging to a minimax optimal estimate. However, the practitioner is often constrained to just tens to hundreds of iterations, and the slow and/or inconsistent convergence of non-convex gradient descent can prevent a high-quality estimate from being obtained. Recently, the technique of preconditioning was shown to be highly effective at accelerating the local convergence of non-convex gradient descent when the measurements are noiseless. In this paper, we describe how preconditioning should be done for noisy measurements to accelerate local convergence to minimax optimality. For the symmetric matrix sensing problem, our proposed preconditioned method is guaranteed to locally converge to minimax error at a linear rate that is immune to ill-conditioning and/or over-parameterization. Using our proposed preconditioned method, we perform a 60 megapixel medical image denoising task, and observe significantly reduced noise levels compared to previous approaches.

Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

TL;DR

The paper tackles low-rank matrix recovery from noisy measurements using a preconditioned non-convex gradient descent approach. It introduces iterations that apply a right preconditioner with a geometrically decaying regularization parameter and proves local linear convergence to minimax error for symmetric matrix sensing under RIP, with a noise-driven error floor . A key insight is a coupling between and the current error, enabled by a change of norm that preserves gradient dominance in the presence of noise, yielding convergence rates independent of ill-conditioning and over-parameterization. The method demonstrates strong empirical performance on Gaussian matrix sensing and a 60 MP ultrafast ultrasound denoising task, outperforming prior preconditioning schemes and achieving effective noise suppression with practical iteration counts.

Abstract

Non-convex gradient descent is a common approach for estimating a low-rank ground truth matrix from noisy measurements, because it has per-iteration costs as low as time, and is in theory capable of converging to a minimax optimal estimate. However, the practitioner is often constrained to just tens to hundreds of iterations, and the slow and/or inconsistent convergence of non-convex gradient descent can prevent a high-quality estimate from being obtained. Recently, the technique of preconditioning was shown to be highly effective at accelerating the local convergence of non-convex gradient descent when the measurements are noiseless. In this paper, we describe how preconditioning should be done for noisy measurements to accelerate local convergence to minimax optimality. For the symmetric matrix sensing problem, our proposed preconditioned method is guaranteed to locally converge to minimax error at a linear rate that is immune to ill-conditioning and/or over-parameterization. Using our proposed preconditioned method, we perform a 60 megapixel medical image denoising task, and observe significantly reduced noise levels compared to previous approaches.
Paper Structure (43 sections, 4 theorems, 38 equations, 9 figures)

This paper contains 43 sections, 4 theorems, 38 equations, 9 figures.

Key Result

Theorem 2.1

Suppose that the initial point $X_{0}$ satisfies $\|\mathcal{A}(X_{0}X_{0}^{T}-M^{*})\|^{2}<\rho^{2}(1-\delta)\lambda_{r^{*}}(M^{\star})^{2}$ with a radius $\rho>0$ that satisfies $\rho^{2}/(1-\rho^{2})\le(1-\delta^{2})/2$. Let the step-size $\alpha$ satisfy $\alpha \leq 1/L,$ where $L>0$ is a const where $\mathcal{E}_{opt} = \frac{\sigma^{2}nr\log n}{m}$. Here the inequality $\lesssim$ hides a co

Figures (9)

  • Figure 1: Preconditioned gradient descent for a 60 megapixel medical image denoising task. We denoise a 2400-frame ultrafast ultrasound image of a rat rain ($200\times 130$ pixels per frame) by running 30 iterations of the low-rank denoising procedure in demene2015spatiotemporal. Top-left: original noisy input. Top-right: image denoised and reconstructed by our preconditioning scheme in \ref{['alg:1']}. Bottom-left: image obtained via the preconditioning scheme in zhang2021preconditioned, which is the previous state-of-the-art. Bottom-right: image obtained by naive non-convex gradient descent without preconditioning.
  • Figure 2: Convergence of our algorithm and ScaledGD$(\lambda)$ using spectral initialization. Left: Noiseless measurements. Right: Noisy measurements with noise variance $\sigma=10^{-6}$.
  • Figure 3: Convergence of our algorithm and PrecGD using spectral initialization. Left: Noiseless measurements. Right: Noisy measurements with noise variance $\sigma=10^{-6}$.
  • Figure 4: Convergence of our algorithm (spectral init.), ScaledGD$(\lambda)$ (small init.) and GD (small init.) for Gaussian matrix sensing. Left: Noiseless measurements. Right: Noisy measurements with noise variance $\sigma=10^{-6}$.
  • Figure 5: Convergence of our algorithm, PrecGD, ScaledGD$(\lambda)$ and GD for Gaussian matrix sensing. Left: Noiseless measurements with $r=r^\star$. Right: Noisy measurements with $r>r^\star$.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Definition 2.1: Restricted Isometry
  • Theorem 2.1
  • Theorem A.1: Noiseless gradient dominance
  • Lemma A.2
  • Lemma A.3
  • proof
  • proof