Compressive Mahalanobis Metric Learning Adapts to Intrinsic Dimension
Efstratios Palias, Ata Kabán
TL;DR
The paper tackles Mahalanobis metric learning in high dimensions and proposes training on randomly compressed data using Gaussian projections. It proves two main theoretical guarantees: a generalisation bound for the compressed metric class and an excess empirical error bound, both governed by the stable (intrinsic) dimension of the data rather than the ambient dimension, and it extends Gordon's theorem to arbitrary domains. The results show a favorable trade-off: smaller projection dimension $k$ reduces complexity but increases distortion, with tighter bounds when the stable dimension is low. Empirical validation on synthetic ellipsoids and benchmark datasets confirms that metric learning in the compressed space can adapt to intrinsic structure while achieving competitive or superior performance to the Euclidean metric, especially under high noise.
Abstract
Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, it can also serve as dimensionality reduction by imposing a low-rank restriction to the learnt metric. In this paper, we consider the problem of learning a Mahalanobis metric, and instead of training a low-rank metric on high-dimensional data, we use a randomly compressed version of the data to train a full-rank metric in this reduced feature space. We give theoretical guarantees on the error for Mahalanobis metric learning, which depend on the stable dimension of the data support, but not on the ambient dimension. Our bounds make no assumptions aside from i.i.d. data sampling from a bounded support, and automatically tighten when benign geometrical structures are present. An important ingredient is an extension of Gordon's theorem, which may be of independent interest. We also corroborate our findings by numerical experiments.
