Dimension-Accuracy Tradeoffs in Contrastive Embeddings for Triplets, Terminals & Top-k Nearest Neighbors
Vaggos Chatziafratis, Piotr Indyk
TL;DR
This work investigates how low-dimensional Euclidean embeddings can preserve the ordinal (relative) order of pairwise distances among $n$ items using triplet comparisons. It introduces the triplet dimension and proves near-tight lower bounds showing that exact triplet preservation may require dimensions as large as $d > \frac{n}{2+\kappa}$ in the worst case, while $n-1$ dimensions always suffice. It also analyzes relaxed ordinal embeddings, establishing nontrivial lower bounds on the relaxation factor that are almost tight up to $\log\log n$ factors, and extends the study to Terminal and Top-$k$-NN ordinal embeddings with corresponding upper and lower bounds that depend on the number of terminals $k$ and the regime of $k$ relative to $n$. The methods combine high-girth graph constructions, edge sampling, and polynomial-sign-pattern counting to derive lower bounds, while simple, constructive embeddings yield tight or near-tight upper bounds in the terminal setting. Collectively, the results illuminate fundamental dimensionality barriers for order-preserving representations and inform practical design of ordinal/contrastive embeddings for tasks like nearest-neighbor search, ranking, and crowdsourced similarity assessments.
Abstract
Metric embeddings traditionally study how to map $n$ items to a target metric space such that distance lengths are not heavily distorted; but what if we only care to preserve the relative order of the distances (and not their length)? In this paper, we are motivated by the following basic question: given triplet comparisons of the form ``item $i$ is closer to item $j$ than to item $k$,'' can we find low-dimensional Euclidean representations for the $n$ items that respect those distance comparisons? Such order-preserving embeddings naturally arise in important applications and have been studied since the 1950s, under the name of ordinal or non-metric embeddings. Our main results are: 1. Nearly-Tight Bounds on Triplet Dimension: We introduce the natural concept of triplet dimension of a dataset, and surprisingly, we show that in order for an ordinal embedding to be triplet-preserving, its dimension needs to grow as $\frac n2$ in the worst case. This is optimal (up to constant) as $n-1$ dimensions always suffice. 2. Tradeoffs for Dimension vs (Ordinal) Relaxation: We then relax the requirement that every triplet should be exactly preserved and present almost tight lower bounds for the maximum ratio between distances whose relative order was inverted by the embedding; this ratio is known as (ordinal) relaxation in the literature and serves as a counterpart to (metric) distortion. 3. New Bounds on Terminal and Top-$k$-NNs Embeddings: Going beyond triplets, we then study two well-motivated scenarios where we care about preserving specific sets of distances (not necessarily triplets). The first scenario is Terminal Ordinal Embeddings and the second scenario is top-$k$-NNs Ordinal Embeddings. To the best of our knowledge, these are some of the first tradeoffs on triplet-preserving ordinal embeddings and the first study of Terminal and Top-$k$-NNs Ordinal Embeddings.
