Table of Contents
Fetching ...

Compressive Mahalanobis Metric Learning Adapts to Intrinsic Dimension

Efstratios Palias, Ata Kabán

TL;DR

The paper tackles Mahalanobis metric learning in high dimensions and proposes training on randomly compressed data using Gaussian projections. It proves two main theoretical guarantees: a generalisation bound for the compressed metric class and an excess empirical error bound, both governed by the stable (intrinsic) dimension of the data rather than the ambient dimension, and it extends Gordon's theorem to arbitrary domains. The results show a favorable trade-off: smaller projection dimension $k$ reduces complexity but increases distortion, with tighter bounds when the stable dimension is low. Empirical validation on synthetic ellipsoids and benchmark datasets confirms that metric learning in the compressed space can adapt to intrinsic structure while achieving competitive or superior performance to the Euclidean metric, especially under high noise.

Abstract

Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, it can also serve as dimensionality reduction by imposing a low-rank restriction to the learnt metric. In this paper, we consider the problem of learning a Mahalanobis metric, and instead of training a low-rank metric on high-dimensional data, we use a randomly compressed version of the data to train a full-rank metric in this reduced feature space. We give theoretical guarantees on the error for Mahalanobis metric learning, which depend on the stable dimension of the data support, but not on the ambient dimension. Our bounds make no assumptions aside from i.i.d. data sampling from a bounded support, and automatically tighten when benign geometrical structures are present. An important ingredient is an extension of Gordon's theorem, which may be of independent interest. We also corroborate our findings by numerical experiments.

Compressive Mahalanobis Metric Learning Adapts to Intrinsic Dimension

TL;DR

The paper tackles Mahalanobis metric learning in high dimensions and proposes training on randomly compressed data using Gaussian projections. It proves two main theoretical guarantees: a generalisation bound for the compressed metric class and an excess empirical error bound, both governed by the stable (intrinsic) dimension of the data rather than the ambient dimension, and it extends Gordon's theorem to arbitrary domains. The results show a favorable trade-off: smaller projection dimension reduces complexity but increases distortion, with tighter bounds when the stable dimension is low. Empirical validation on synthetic ellipsoids and benchmark datasets confirms that metric learning in the compressed space can adapt to intrinsic structure while achieving competitive or superior performance to the Euclidean metric, especially under high noise.

Abstract

Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, it can also serve as dimensionality reduction by imposing a low-rank restriction to the learnt metric. In this paper, we consider the problem of learning a Mahalanobis metric, and instead of training a low-rank metric on high-dimensional data, we use a randomly compressed version of the data to train a full-rank metric in this reduced feature space. We give theoretical guarantees on the error for Mahalanobis metric learning, which depend on the stable dimension of the data support, but not on the ambient dimension. Our bounds make no assumptions aside from i.i.d. data sampling from a bounded support, and automatically tighten when benign geometrical structures are present. An important ingredient is an extension of Gordon's theorem, which may be of independent interest. We also corroborate our findings by numerical experiments.
Paper Structure (8 sections, 6 theorems, 38 equations, 2 figures)

This paper contains 8 sections, 6 theorems, 38 equations, 2 figures.

Key Result

Lemma 3

For any set $T\subset\mathbb R^d$,

Figures (2)

  • Figure 1: Out-of-sample error of $1$-NN on compressed synthetic data sets, with metric learning, averaged over $10$ Gaussian random projections, for several choices of $d$. The data support was $\mathcal{X}=A{\mathcal{S}}^{d-1}$, where $A\in\mathbb{R}^{d\times d}$ is a diagonal matrix, and the titles of the subplots shows its $i$-th diagonal element, for $i\in[d]$. The legends show the projection dimension, $k$.
  • Figure 2: Out-of-sample error of $1$-NN classification with metric learning (solid lines) and with Euclidean metric (dashed lines), of benchmark UCI data sets. All data sets were normalised to $[0,1]$, embedded to a $100$-dimensions, and had i.i.d. Gaussian random noise of variance $\gamma$ (shown in the legend) added to each of their instances. A train/test ratio of $80\%/20\%$ was used. The curves represent averages over $10$ independent Gaussian random projections. The error bars show intervals of one standard error.

Theorems & Definitions (10)

  • Definition 1: Gaussian width vershynin2018high
  • Definition 2: Stable dimension vershynin2018high
  • Lemma 3: vershynin2018high
  • Lemma 4
  • Lemma 5: Sudakov-Fernique's inequality vershynin2018high
  • Theorem 6: Compressed generalisation error
  • Lemma 7: Rademacher bound bartlett2002rademacher
  • Remark 8
  • Theorem 9: Excess empirical error
  • Remark 10