Table of Contents
Fetching ...

Online Adaptive Mahalanobis Distance Estimation

Lianke Qin, Aravind Reddy, Zhao Song

TL;DR

This work introduces the first dimension-reduction framework for Mahalanobis distances by developing an online-adaptive Approximate Distance Estimation (ADE) data-structure based on Johnson-Lindenstrauss sketches. It enables efficient distance estimation to all data points with adaptively chosen queries and online updates to both the data and the metric represented as $A=U^\top U$, while delivering high-probability $(1\pm\varepsilon)$-accurate results. The authors establish a Monte Carlo data structure and an online maintenance variant with formal guarantees for Initialize, UpdateU, UpdateX, QueryPair, QueryAll, and SampleExact operations, and they validate performance on benchmark datasets in terms of accuracy, time, and memory. This approach paves the way for practical online Mahalanobis metric learning and adaptive-query applications by combining sketching techniques with online metric updates. The work provides a theoretical and empirical foundation for deploying sketch-based Mahalanobis distance estimation in real-time learning and decision systems where the metric evolves over time.

Abstract

Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.

Online Adaptive Mahalanobis Distance Estimation

TL;DR

This work introduces the first dimension-reduction framework for Mahalanobis distances by developing an online-adaptive Approximate Distance Estimation (ADE) data-structure based on Johnson-Lindenstrauss sketches. It enables efficient distance estimation to all data points with adaptively chosen queries and online updates to both the data and the metric represented as , while delivering high-probability -accurate results. The authors establish a Monte Carlo data structure and an online maintenance variant with formal guarantees for Initialize, UpdateU, UpdateX, QueryPair, QueryAll, and SampleExact operations, and they validate performance on benchmark datasets in terms of accuracy, time, and memory. This approach paves the way for practical online Mahalanobis metric learning and adaptive-query applications by combining sketching techniques with online metric updates. The work provides a theoretical and empirical foundation for deploying sketch-based Mahalanobis distance estimation in real-time learning and decision systems where the metric evolves over time.

Abstract

Mahalanobis metrics are widely used in machine learning in conjunction with methods like -nearest neighbors, -means clustering, and -medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.
Paper Structure (27 sections, 13 theorems, 27 equations, 4 figures, 5 algorithms)

This paper contains 27 sections, 13 theorems, 27 equations, 4 figures, 5 algorithms.

Key Result

Lemma 5

Let $X_{1}, \ldots, X_{n}$ be independent random variables such that $X_{i} \in\left[a_{i}, b_{i}\right]$ almost surely for $i \in[n]$ and let $S=\sum_{i=1}^{n} X_{i}-\mathbb{E}[X_{i}]$. Then, for every $t>0$ :

Figures (4)

  • Figure 1: Benchmark results for (a) QueryAll accuracy under different $m$. (b) QueryAll time under different $m$.
  • Figure 2: Benchmark results for (a) QueryPair time under different $m$. (b) Data structure memory usage under different $m$.
  • Figure 3: Benchmark results for (a) QueryAll accuracy under different $m$. (b) QueryAll time under different $m$.
  • Figure 4: Benchmark results for (a) QueryPair time under different $m$. (b) Data structure memory usage under different $m$.

Theorems & Definitions (26)

  • Definition 1: Approximate Mahalanobis Distance Estimation
  • Definition 2: Online Adaptive Mahalanobis Distance Estimation
  • Definition 3: Low-Rank Mahalanobis Pseudo-Metric
  • Definition 4: $(\varepsilon,\beta)$-representative, Definition B.2 in cn20
  • Lemma 5: Hoeffding’s Inequality h63
  • Lemma 6
  • Theorem 7: Guarantees for JL sketch for Mahalanobis Metrics
  • proof
  • Lemma 8: Distributional JL (DJL) lemma
  • Lemma 9: Gaussian Annulus Theorem
  • ...and 16 more