Table of Contents
Fetching ...

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

Chandler Smith, HanQin Cai, Abiy Tasissa

TL;DR

The paper tackles EDG by casting point-configuration recovery from partial distances as low-rank Gram-matrix completion on the rank-$r$ manifold. It introduces two non-convex Riemannian algorithms, one using a non-self-adjoint sampling operator $\mathcal{R}_{\Omega}$ and another employing a self-adjoint surrogate $\mathcal{F}_{\Omega}$, with provable local convergence under RIP-type conditions. Two initialization schemes—one-step hard-thresholding and a resampling-based approach with trimming—yield guarantees and improved sample complexity, including bounds on $m$ such as $m \ge O(n^{7/4} r^2 \log n)$ initially and $m \ge O(n r^2 \log n)$ with refinement. Numerical experiments on synthetic datasets and real protein-structure data show competitive performance, with overparameterization (rank above $r$) offering practical gains and the self-adjoint surrogate delivering strong results in practice. The work advances provable non-convex EDG recovery via a dual-basis Riemannian framework, with clear paths for extending to non-uniform sampling and broader bases.

Abstract

The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the Gram matrix, using partial measurements represented as expansion coefficients of the Gram matrix in a non-orthogonal basis. For the first algorithm, under a uniform sampling with replacement model for the observed distance entries, we demonstrate that, with high probability, a Riemannian gradient-like algorithm on the manifold of rank-$r$ matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples, $m$, satisfies $m \geq \mathcal{O}(n^{7/4}r^2 \log(n))$. With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to $m \geq \mathcal{O}(nr^2 \log(n))$. Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank-$r$ matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.

Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees

TL;DR

The paper tackles EDG by casting point-configuration recovery from partial distances as low-rank Gram-matrix completion on the rank- manifold. It introduces two non-convex Riemannian algorithms, one using a non-self-adjoint sampling operator and another employing a self-adjoint surrogate , with provable local convergence under RIP-type conditions. Two initialization schemes—one-step hard-thresholding and a resampling-based approach with trimming—yield guarantees and improved sample complexity, including bounds on such as initially and with refinement. Numerical experiments on synthetic datasets and real protein-structure data show competitive performance, with overparameterization (rank above ) offering practical gains and the self-adjoint surrogate delivering strong results in practice. The work advances provable non-convex EDG recovery via a dual-basis Riemannian framework, with clear paths for extending to non-uniform sampling and broader bases.

Abstract

The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the Gram matrix, using partial measurements represented as expansion coefficients of the Gram matrix in a non-orthogonal basis. For the first algorithm, under a uniform sampling with replacement model for the observed distance entries, we demonstrate that, with high probability, a Riemannian gradient-like algorithm on the manifold of rank- matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples, , satisfies . With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to . Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank- matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.
Paper Structure (30 sections, 21 theorems, 150 equations, 4 figures, 5 tables, 4 algorithms)

This paper contains 30 sections, 21 theorems, 150 equations, 4 figures, 5 tables, 4 algorithms.

Key Result

Theorem 5.4

With probability at least $1-2n^{1-\beta}$, for $m\geq \frac{8}{3}\beta\nu^2 r^2 n\log(n)$. In particular, for any $\varepsilon_0>0$ if $m\geq \frac{8}{3}\beta\left(\frac{\nu r}{\varepsilon_0}\right)^2 n\log(n)$. Additionally, under the same conditions as above, we also have

Figures (4)

  • Figure 1: A diagram of a simple first-order retraction method on $\mathcal{M}_r$. Again, $\nabla f(\bm{X}_l)$ is the Euclidean gradient of $f$ at $\bm{X}_l$, $\mathrm{grad}\, f(\bm{X}_l)$ is the Riemannian gradient at $\bm{X}_l$, and $\bm{X}_{l+1} = \mathcal{H}_r(\bm{X}_l - \alpha_l\mathrm{grad}\,f(\bm{X}_l))$, as in \ref{['eqn: first order retraction']}.
  • Figure 2: This diagram is a schematic of the overall proof of convergence. Arrows indicate how results depend on one another, and how they link together to form the overall proof of convergence. Not every exact dependency is shown in this figure for legibility purposes, instead focusing on the key pieces of the overall flow of the argument.
  • Figure 3: Structured sampling method for distance matrices proposed in lichtenberg_structured for the experiments.
  • Figure 4: Target structure (in blue) and numerically estimated structure (in orange) following $100000$ iterations of Algorithm \ref{['alg:F_omega descent']}. (Left) Target structure 1AX8, $\gamma = 0.3$ and $k=6$ (RMSE = 0.014). (Right): Target structure 1AX8, $\gamma = 0.3$ and $k=6$ (RMSE = 0.06).

Theorems & Definitions (47)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 5.4: Restricted Isometry Property (RIP) for $\mathcal{P}_{\mathbb{T}}\mathcal{R}_{\Omega}\mathcal{P}_{\mathbb{T}}$
  • proof : Proof sketch
  • Theorem 5.5: Local Convergence of Algorithm \ref{['alg:R_omega descent']}
  • proof : Proof sketch of Theorem \ref{['thm: Local Convergence']}
  • Lemma 5.6: Initialization via One Step Hard Thresholding
  • proof
  • Theorem 5.7: Recovery Guarantee I
  • ...and 37 more