Table of Contents
Fetching ...

Efficient Data-Driven Leverage Score Sampling Algorithm for the Minimum Volume Covering Ellipsoid Problem in Big Data

Elizabeth Harris, Ali Eshragh, Bishnu Lamichhane, Jordan Shaw-Carmody, Elizabeth Stojanovski

Abstract

The Minimum Volume Covering Ellipsoid (MVCE) problem, characterised by $n$ observations in $d$ dimensions where $n \gg d$, can be computationally very expensive in the big data regime. We apply methods from randomised numerical linear algebra to develop a data-driven leverage score sampling algorithm for solving MVCE, and establish theoretical error bounds and a convergence guarantee. Assuming the leverage scores follow a power law decay, we show that the computational complexity of computing the approximation for MVCE is reduced from $\mathcal{O}(nd^2)$ to $\mathcal{O}(nd + \text{poly}(d))$, which is a significant improvement in big data problems. Numerical experiments demonstrate the efficacy of our new algorithm, showing that it substantially reduces computation time and yields near-optimal solutions.

Efficient Data-Driven Leverage Score Sampling Algorithm for the Minimum Volume Covering Ellipsoid Problem in Big Data

Abstract

The Minimum Volume Covering Ellipsoid (MVCE) problem, characterised by observations in dimensions where , can be computationally very expensive in the big data regime. We apply methods from randomised numerical linear algebra to develop a data-driven leverage score sampling algorithm for solving MVCE, and establish theoretical error bounds and a convergence guarantee. Assuming the leverage scores follow a power law decay, we show that the computational complexity of computing the approximation for MVCE is reduced from to , which is a significant improvement in big data problems. Numerical experiments demonstrate the efficacy of our new algorithm, showing that it substantially reduces computation time and yields near-optimal solutions.

Paper Structure

This paper contains 25 sections, 14 theorems, 75 equations, 3 figures, 5 tables, 2 algorithms.

Key Result

Proposition 2.1

If we have a $\delta$-primal feasible (or $\delta$-approximately optimal) solution $\bm{u}$, then $\bm{u}$ and $(1+\delta)^{-1} \bm{Q}\left( \bm{u} \right)$ are both within $d \log \left( 1 + \delta \right)$ of being optimal in Dual and Primal, respectively.

Figures (3)

  • Figure 1: Rotated Cauchy: Calculated optimality gap summary (a) and time summary (b).
  • Figure 2: Lognormal: Calculated optimality gap summary (a) and time summary (b).
  • Figure 3: Gaussian: Calculated optimality gap summary (a) and time summary (b).

Theorems & Definitions (24)

  • Proposition 2.1: todd2016book, Proposition 2.9
  • Theorem 3.1: mccurdy2019deterministic, Theorem 1
  • Theorem 5.1
  • proof
  • Theorem 6.1
  • proof
  • Theorem A.1
  • proof
  • Theorem B.1: eshragh2023sequential, Theorem 4
  • Corollary B.2
  • ...and 14 more