Table of Contents
Fetching ...

Ellipsoid fitting with the Cayley transform

Omar Melikechi, David B. Dunson

TL;DR

CTEF introduces a principled, ellipsoid-specific fitting framework based on the Cayley transform to recast rotation parameters and enforce bound-constrained optimization in Euclidean space. By aligning the fit to transformed data with a carefully designed feasible set, it achieves invariance to translations and rotations and robust performance when data are nonuniformly distributed on ellipsoids, across high dimensions. The method yields interpretable ellipsoid parameters that support dimension reduction, visualization, and clustering, and demonstrates superior performance on synthetic Ellipsoid-Gaussian data as well as real-world tasks like cell-cycle visualization and circadian gene analysis. While slower than some baselines, CTEF is deterministic and reproducible, with practical runtimes and clear geometric interpretations that make it well-suited for applications requiring stable, global structure recovery.

Abstract

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

Ellipsoid fitting with the Cayley transform

TL;DR

CTEF introduces a principled, ellipsoid-specific fitting framework based on the Cayley transform to recast rotation parameters and enforce bound-constrained optimization in Euclidean space. By aligning the fit to transformed data with a carefully designed feasible set, it achieves invariance to translations and rotations and robust performance when data are nonuniformly distributed on ellipsoids, across high dimensions. The method yields interpretable ellipsoid parameters that support dimension reduction, visualization, and clustering, and demonstrates superior performance on synthetic Ellipsoid-Gaussian data as well as real-world tasks like cell-cycle visualization and circadian gene analysis. While slower than some baselines, CTEF is deterministic and reproducible, with practical runtimes and clear geometric interpretations that make it well-suited for applications requiring stable, global structure recovery.

Abstract

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.
Paper Structure (20 sections, 2 theorems, 25 equations, 18 figures, 2 tables, 2 algorithms)

This paper contains 20 sections, 2 theorems, 25 equations, 18 figures, 2 tables, 2 algorithms.

Key Result

Proposition 2.2

Fix $x\in\mathbb{R}^p$ and define $\ell:\mathbb{R}^p_+\times\mathbb{R}^p\times\mathbb{R}^{p(p-1)/2}\to\mathbb{R}$ by $\ell(a,c,s)=\tfrac{1}{2}\lVert AR(x-c)\rVert^2$ with $A=\mathop{\mathrm{diag}}\nolimits(a)$ and $R=\mathop{\mathrm{Cay}}\nolimits(S(s))$ as above. Then where $y=x-c$, $\odot$ is the Hadamard product, and $B=(I-S(s))^{-1}A^2Ryy^T(I+R^T)$.

Figures (18)

  • Figure 1: Feasible sets (blue) for the center when $w=1$ (left) and $w=0.5$ (right). Samples $X=\{x^{(i)}\}$ are drawn from the Ellipsoid-Gaussian distribution described in \ref{['sec:experiments']} with $\tau=2$ and $1\%$ noise, then transformed by $\Phi_X$. The star $c_0$ marks the midpoint of $w[c^-,c^+]$ and is the default initial value in our algorithm. The initial value for $a$ is the vector of all ones. Note $c_0$ does not depend on $w$.
  • Figure 2: (Left) Best fit ellipsoids for different feasible set weights $w$ defined in \ref{['sec:feasible']}. Each color represents a different $w$. The best fit ellipsoid when $w=0.5$ (orange) closely resembles the true ellipsoid (thick blue curve). (Right) Loss corresponding to each $w$. For example, the loss when $w=0.5$ is approximately $0.01$. While loss decreases monotonically to $0$, centers and axis lengths of the fitted ellipsoids diverge. Here $n=30$ data points (black dots, left panel) are simulated from the Ellipsoid-Gaussian model with $\tau=2$ and axis ratio $3$.
  • Figure 3: Errors for different $\tau$ with $p=3$, $r_{ax}=2.5$, noise $=1\%$, and $n=18$. Only CTEF is stable for all values of $\tau$.
  • Figure 4: Errors for different noise values with $p=3$, $\tau=0$, $r_{ax}=2.5$, and $n=18$. Only CTEF, SOD, and HES are stable as noise increases.
  • Figure 5: Errors for different axis ratios with $p=3$, $\tau=0$, noise $=1\%$, and $n=18$. All methods except HES perform similarly.
  • ...and 13 more figures

Theorems & Definitions (9)

  • Remark 2.1
  • Proposition 2.2
  • proof
  • Example 2.3
  • Proposition 2.4
  • proof
  • Remark 2.5
  • Remark 2.6
  • Remark 4.1