Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

Gavin Zhang; Hong-Ming Chiu; Richard Y. Zhang

Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

Gavin Zhang, Hong-Ming Chiu, Richard Y. Zhang

TL;DR

The paper tackles low-rank matrix recovery from noisy measurements using a preconditioned non-convex gradient descent approach. It introduces iterations that apply a right preconditioner with a geometrically decaying regularization parameter $\eta$ and proves local linear convergence to minimax error for symmetric matrix sensing under RIP, with a noise-driven error floor $\mathcal{E}_{opt}=\frac{\sigma^{2}nr\log n}{m}$. A key insight is a coupling between $\eta_t$ and the current error, enabled by a change of norm that preserves gradient dominance in the presence of noise, yielding convergence rates independent of ill-conditioning and over-parameterization. The method demonstrates strong empirical performance on Gaussian matrix sensing and a 60 MP ultrafast ultrasound denoising task, outperforming prior preconditioning schemes and achieving effective noise suppression with practical iteration counts.

Abstract

Non-convex gradient descent is a common approach for estimating a low-rank $n\times n$ ground truth matrix from noisy measurements, because it has per-iteration costs as low as $O(n)$ time, and is in theory capable of converging to a minimax optimal estimate. However, the practitioner is often constrained to just tens to hundreds of iterations, and the slow and/or inconsistent convergence of non-convex gradient descent can prevent a high-quality estimate from being obtained. Recently, the technique of preconditioning was shown to be highly effective at accelerating the local convergence of non-convex gradient descent when the measurements are noiseless. In this paper, we describe how preconditioning should be done for noisy measurements to accelerate local convergence to minimax optimality. For the symmetric matrix sensing problem, our proposed preconditioned method is guaranteed to locally converge to minimax error at a linear rate that is immune to ill-conditioning and/or over-parameterization. Using our proposed preconditioned method, we perform a 60 megapixel medical image denoising task, and observe significantly reduced noise levels compared to previous approaches.

Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

TL;DR

and proves local linear convergence to minimax error for symmetric matrix sensing under RIP, with a noise-driven error floor

. A key insight is a coupling between

and the current error, enabled by a change of norm that preserves gradient dominance in the presence of noise, yielding convergence rates independent of ill-conditioning and over-parameterization. The method demonstrates strong empirical performance on Gaussian matrix sensing and a 60 MP ultrafast ultrasound denoising task, outperforming prior preconditioning schemes and achieving effective noise suppression with practical iteration counts.

Abstract

Non-convex gradient descent is a common approach for estimating a low-rank

ground truth matrix from noisy measurements, because it has per-iteration costs as low as

time, and is in theory capable of converging to a minimax optimal estimate. However, the practitioner is often constrained to just tens to hundreds of iterations, and the slow and/or inconsistent convergence of non-convex gradient descent can prevent a high-quality estimate from being obtained. Recently, the technique of preconditioning was shown to be highly effective at accelerating the local convergence of non-convex gradient descent when the measurements are noiseless. In this paper, we describe how preconditioning should be done for noisy measurements to accelerate local convergence to minimax optimality. For the symmetric matrix sensing problem, our proposed preconditioned method is guaranteed to locally converge to minimax error at a linear rate that is immune to ill-conditioning and/or over-parameterization. Using our proposed preconditioned method, we perform a 60 megapixel medical image denoising task, and observe significantly reduced noise levels compared to previous approaches.

Paper Structure (43 sections, 4 theorems, 38 equations, 9 figures)

This paper contains 43 sections, 4 theorems, 38 equations, 9 figures.

INTRODUCTION
Accelerating convergence via preconditioning
Our contribution: How to precondition in the presence of measurement noise?
Limitations
Related Work
Non-convex gradient descent converges to minimax optimality
Accelerating local convergence via preconditioning
Small random initialization
Notations
MAIN RESULTS
KEY IDEA and PROOF SKETCH
Key Innovations
Proof Sketch
NUMERICAL SIMULATIONS
Gaussian matrix sensing
...and 28 more sections

Key Result

Theorem 2.1

Suppose that the initial point $X_{0}$ satisfies $\|\mathcal{A}(X_{0}X_{0}^{T}-M^{*})\|^{2}<\rho^{2}(1-\delta)\lambda_{r^{*}}(M^{\star})^{2}$ with a radius $\rho>0$ that satisfies $\rho^{2}/(1-\rho^{2})\le(1-\delta^{2})/2$. Let the step-size $\alpha$ satisfy $\alpha \leq 1/L,$ where $L>0$ is a const where $\mathcal{E}_{opt} = \frac{\sigma^{2}nr\log n}{m}$. Here the inequality $\lesssim$ hides a co

Figures (9)

Figure 1: Preconditioned gradient descent for a 60 megapixel medical image denoising task. We denoise a 2400-frame ultrafast ultrasound image of a rat rain ($200\times 130$ pixels per frame) by running 30 iterations of the low-rank denoising procedure in demene2015spatiotemporal. Top-left: original noisy input. Top-right: image denoised and reconstructed by our preconditioning scheme in \ref{['alg:1']}. Bottom-left: image obtained via the preconditioning scheme in zhang2021preconditioned, which is the previous state-of-the-art. Bottom-right: image obtained by naive non-convex gradient descent without preconditioning.
Figure 2: Convergence of our algorithm and ScaledGD$(\lambda)$ using spectral initialization. Left: Noiseless measurements. Right: Noisy measurements with noise variance $\sigma=10^{-6}$.
Figure 3: Convergence of our algorithm and PrecGD using spectral initialization. Left: Noiseless measurements. Right: Noisy measurements with noise variance $\sigma=10^{-6}$.
Figure 4: Convergence of our algorithm (spectral init.), ScaledGD$(\lambda)$ (small init.) and GD (small init.) for Gaussian matrix sensing. Left: Noiseless measurements. Right: Noisy measurements with noise variance $\sigma=10^{-6}$.
Figure 5: Convergence of our algorithm, PrecGD, ScaledGD$(\lambda)$ and GD for Gaussian matrix sensing. Left: Noiseless measurements with $r=r^\star$. Right: Noisy measurements with $r>r^\star$.
...and 4 more figures

Theorems & Definitions (7)

Definition 2.1: Restricted Isometry
Theorem 2.1
Theorem A.1: Noiseless gradient dominance
Lemma A.2
Lemma A.3
proof
proof

Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

TL;DR

Abstract

Fast and Accurate Estimation of Low-Rank Matrices from Noisy Measurements via Preconditioned Non-Convex Gradient Descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)