Table of Contents
Fetching ...

A Dual Basis Approach for Structured Robust Euclidean Distance Geometry

Chandra Kundu, Abiy Tasissa, HanQin Cai

TL;DR

RoDEoDB addresses robust Euclidean Distance Geometry under structured anchor–target observations with sparse outliers by exploiting a non-orthogonal dual-basis mapping between the distance matrix and a low-rank Gram matrix. The method operates in two phases: first, a Dual Basis Alternating Projections (DBAP) step robustly recovers the Gram block from corrupted anchor–target data; second, Nyström reconstruction yields the full Gram matrix and the $d$-dimensional point configuration. Theoretical guarantees under $\mu$-incoherence and $\alpha$-sparsity show exact recovery of both the Gram matrix and the point set with high probability, and empirical results on synthetic and molecular datasets demonstrate superior robustness and accuracy compared to baselines, especially with limited anchors and higher corruption. This framework enables reliable localization and conformation tasks in sensor networks and molecular modeling where only anchor–target distances are available and noisy.

Abstract

Euclidean Distance Matrix (EDM), which consists of pairwise squared Euclidean distances of a given point configuration, finds many applications in modern machine learning. This paper considers the setting where only a set of anchor nodes is used to collect the distances between themselves and the rest. In the presence of potential outliers, it results in a structured partial observation on EDM with partial corruptions. Note that an EDM can be connected to a positive semi-definite Gram matrix via a non-orthogonal dual basis. Inspired by recent development of non-orthogonal dual basis in optimization, we propose a novel algorithmic framework, dubbed Robust Euclidean Distance Geometry via Dual Basis (RoDEoDB), for recovering the Euclidean distance geometry, i.e., the underlying point configuration. The exact recovery guarantees have been established in terms of both the Gram matrix and point configuration, under some mild conditions. Empirical experiments show superior performance of RoDEoDB on sensor localization and molecular conformation datasets.

A Dual Basis Approach for Structured Robust Euclidean Distance Geometry

TL;DR

RoDEoDB addresses robust Euclidean Distance Geometry under structured anchor–target observations with sparse outliers by exploiting a non-orthogonal dual-basis mapping between the distance matrix and a low-rank Gram matrix. The method operates in two phases: first, a Dual Basis Alternating Projections (DBAP) step robustly recovers the Gram block from corrupted anchor–target data; second, Nyström reconstruction yields the full Gram matrix and the -dimensional point configuration. Theoretical guarantees under -incoherence and -sparsity show exact recovery of both the Gram matrix and the point set with high probability, and empirical results on synthetic and molecular datasets demonstrate superior robustness and accuracy compared to baselines, especially with limited anchors and higher corruption. This framework enables reliable localization and conformation tasks in sensor networks and molecular modeling where only anchor–target distances are available and noisy.

Abstract

Euclidean Distance Matrix (EDM), which consists of pairwise squared Euclidean distances of a given point configuration, finds many applications in modern machine learning. This paper considers the setting where only a set of anchor nodes is used to collect the distances between themselves and the rest. In the presence of potential outliers, it results in a structured partial observation on EDM with partial corruptions. Note that an EDM can be connected to a positive semi-definite Gram matrix via a non-orthogonal dual basis. Inspired by recent development of non-orthogonal dual basis in optimization, we propose a novel algorithmic framework, dubbed Robust Euclidean Distance Geometry via Dual Basis (RoDEoDB), for recovering the Euclidean distance geometry, i.e., the underlying point configuration. The exact recovery guarantees have been established in terms of both the Gram matrix and point configuration, under some mild conditions. Empirical experiments show superior performance of RoDEoDB on sensor localization and molecular conformation datasets.

Paper Structure

This paper contains 22 sections, 17 theorems, 66 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $\bm{D} \in \mathbb{R}^{T \times T}$ be a $\mu$-incoherent EDM, with $\mathop{\mathrm{rank}}\nolimits(\bm{D}) = d+2$. Suppose that the set of anchor indices ${\mathcal{I}} \subseteq [T]$ is uniformly sampled without replacement, and $m = |{\mathcal{I}}|$ satisfy $m \geq \gamma (d+2) \sqrt{\frac{ with probability at least $1 - \frac{2d}{T^{c(\delta + (1-\delta)\log(1-\delta))}}$.

Figures (6)

  • Figure 1: Phase transition over 1000 trials showing recovery rate (RMSE $\leq 1$) for $T = 500$ sensors across varying anchor counts $m$ and outlier density $\alpha$. Top row: $d = 2$; Bottom row: $d = 3$. From left to right: RoDEoDB, SREDG, and GD.
  • Figure 2: Reconstruction of synthetic 2D spiral dataset embedded in $\mathbb{R}^{10}$ with Gaussian noise and $\alpha = 20\%$ sparse outliers. Points are colored by their original angular index $\theta$; anchor locations are highlighted by black '+' symbols. Left: original spiral. Right: Procrustes-aligned reconstructions obtained with RoDEoDB (top row) and SREDG (bottom row) for anchor counts $m \in \{20, 30, 40\}$.
  • Figure 3: Visual representation of reconstruction of Protein 1AX8 by RoDEoDB under different corruption levels. 30 anchors are used in all panels. The reconstructed protein structure (in brown) closely aligns with the true protein structure (in blue). Visualizations rendered using PyMOL pymol.
  • Figure 4: Visual reconstruction results for the 2D sensor localization task with $100$ total points, $20$ anchors. Each panel shows a representative embedding recovered under different corruption scenarios.
  • Figure 5: Comparison of RoDEoDB and SREDG on 3D synthetic sensor localization data with $20\%$ outliers among $500$ total points, showing RMSE versus anchor count averaged over $1000$ trials.
  • ...and 1 more figures

Theorems & Definitions (33)

  • Definition 3.1: $\mu$-incoherence candes2012exactrecht2011simpler
  • Definition 3.2: $\alpha$-sparsty yi2016fastcai2019accelerated
  • Theorem 3.1
  • Theorem 3.2
  • Lemma A.1
  • proof
  • Corollary A.2
  • proof
  • Lemma A.3
  • proof
  • ...and 23 more