Online Adaptive Mahalanobis Distance Estimation
Lianke Qin, Aravind Reddy, Zhao Song
TL;DR
This work introduces the first dimension-reduction framework for Mahalanobis distances by developing an online-adaptive Approximate Distance Estimation (ADE) data-structure based on Johnson-Lindenstrauss sketches. It enables efficient distance estimation to all data points with adaptively chosen queries and online updates to both the data and the metric represented as $A=U^\top U$, while delivering high-probability $(1\pm\varepsilon)$-accurate results. The authors establish a Monte Carlo data structure and an online maintenance variant with formal guarantees for Initialize, UpdateU, UpdateX, QueryPair, QueryAll, and SampleExact operations, and they validate performance on benchmark datasets in terms of accuracy, time, and memory. This approach paves the way for practical online Mahalanobis metric learning and adaptive-query applications by combining sketching techniques with online metric updates. The work provides a theoretical and empirical foundation for deploying sketch-based Mahalanobis distance estimation in real-time learning and decision systems where the metric evolves over time.
Abstract
Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.
