Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

Chandler Smith; HanQin Cai; Abiy Tasissa

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

Chandler Smith, HanQin Cai, Abiy Tasissa

TL;DR

The paper tackles EDG by casting point-configuration recovery from partial distances as low-rank Gram-matrix completion on the rank-$r$ manifold. It introduces two non-convex Riemannian algorithms, one using a non-self-adjoint sampling operator $\mathcal{R}_{\Omega}$ and another employing a self-adjoint surrogate $\mathcal{F}_{\Omega}$, with provable local convergence under RIP-type conditions. Two initialization schemes—one-step hard-thresholding and a resampling-based approach with trimming—yield guarantees and improved sample complexity, including bounds on $m$ such as $m \ge O(n^{7/4} r^2 \log n)$ initially and $m \ge O(n r^2 \log n)$ with refinement. Numerical experiments on synthetic datasets and real protein-structure data show competitive performance, with overparameterization (rank above $r$) offering practical gains and the self-adjoint surrogate delivering strong results in practice. The work advances provable non-convex EDG recovery via a dual-basis Riemannian framework, with clear paths for extending to non-uniform sampling and broader bases.

Abstract

The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the Gram matrix, using partial measurements represented as expansion coefficients of the Gram matrix in a non-orthogonal basis. For the first algorithm, under a uniform sampling with replacement model for the observed distance entries, we demonstrate that, with high probability, a Riemannian gradient-like algorithm on the manifold of rank-$r$ matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples, $m$, satisfies $m \geq \mathcal{O}(n^{7/4}r^2 \log(n))$. With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to $m \geq \mathcal{O}(nr^2 \log(n))$. Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank-$r$ matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

TL;DR

The paper tackles EDG by casting point-configuration recovery from partial distances as low-rank Gram-matrix completion on the rank-

manifold. It introduces two non-convex Riemannian algorithms, one using a non-self-adjoint sampling operator

and another employing a self-adjoint surrogate

, with provable local convergence under RIP-type conditions. Two initialization schemes—one-step hard-thresholding and a resampling-based approach with trimming—yield guarantees and improved sample complexity, including bounds on

such as

initially and

with refinement. Numerical experiments on synthetic datasets and real protein-structure data show competitive performance, with overparameterization (rank above

) offering practical gains and the self-adjoint surrogate delivering strong results in practice. The work advances provable non-convex EDG recovery via a dual-basis Riemannian framework, with clear paths for extending to non-uniform sampling and broader bases.

Abstract

matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples,

, satisfies

. With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to

. Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank-

matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.

Paper Structure (30 sections, 21 theorems, 150 equations, 4 figures, 5 tables, 4 algorithms)

This paper contains 30 sections, 21 theorems, 150 equations, 4 figures, 5 tables, 4 algorithms.

Introduction
Contributions
Notation
Organization
Background
Dual Basis
Riemannian Optimization
Matrix Completion
Dual Basis Approach to EDG
Related Work
A Riemannian Approach to Matrix Completion
Euclidean Distance Geometry Algorithms
Related Geometric Approaches to EDG
The Riemannian Dual Basis Approach to EDG
Theoretical Analysis
...and 15 more sections

Key Result

Theorem 5.4

With probability at least $1-2n^{1-\beta}$, for $m\geq \frac{8}{3}\beta\nu^2 r^2 n\log(n)$. In particular, for any $\varepsilon_0>0$ if $m\geq \frac{8}{3}\beta\left(\frac{\nu r}{\varepsilon_0}\right)^2 n\log(n)$. Additionally, under the same conditions as above, we also have

Figures (4)

Figure 1: A diagram of a simple first-order retraction method on $\mathcal{M}_r$. Again, $\nabla f(\bm{X}_l)$ is the Euclidean gradient of $f$ at $\bm{X}_l$, $\mathrm{grad}\, f(\bm{X}_l)$ is the Riemannian gradient at $\bm{X}_l$, and $\bm{X}_{l+1} = \mathcal{H}_r(\bm{X}_l - \alpha_l\mathrm{grad}\,f(\bm{X}_l))$, as in \ref{['eqn: first order retraction']}.
Figure 2: This diagram is a schematic of the overall proof of convergence. Arrows indicate how results depend on one another, and how they link together to form the overall proof of convergence. Not every exact dependency is shown in this figure for legibility purposes, instead focusing on the key pieces of the overall flow of the argument.
Figure 3: Structured sampling method for distance matrices proposed in lichtenberg_structured for the experiments.
Figure 4: Target structure (in blue) and numerically estimated structure (in orange) following $100000$ iterations of Algorithm \ref{['alg:F_omega descent']}. (Left) Target structure 1AX8, $\gamma = 0.3$ and $k=6$ (RMSE = 0.014). (Right): Target structure 1AX8, $\gamma = 0.3$ and $k=6$ (RMSE = 0.06).

Theorems & Definitions (47)

Remark 1
Remark 2
Remark 3
Theorem 5.4: Restricted Isometry Property (RIP) for $\mathcal{P}_{\mathbb{T}}\mathcal{R}_{\Omega}\mathcal{P}_{\mathbb{T}}$
proof : Proof sketch
Theorem 5.5: Local Convergence of Algorithm \ref{['alg:R_omega descent']}
proof : Proof sketch of Theorem \ref{['thm: Local Convergence']}
Lemma 5.6: Initialization via One Step Hard Thresholding
proof
Theorem 5.7: Recovery Guarantee I
...and 37 more

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

TL;DR

Abstract

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (47)