Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings

Tongzhou Wang; Phillip Isola

Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings

Tongzhou Wang, Phillip Isola

TL;DR

The paper introduces Interval Quasimetric Embedding (IQE), a latent-distance learning approach that represents asymmetrical distances via unions of intervals in a latent space. IQE provides four core properties—respecting constraints, universal approximation, low parameter count, and latent positive homogeneity—while enabling simple aggregation via IQE-sum or IQE-maxmean. The authors establish strong universal-approximation guarantees for both finite and general cases and draw connections to prior methods (PQE, MRN, Deep Norm, Wide Norm). Empirically, IQE implementations significantly outperform baselines on large real-world graphs, random graphs, and offline Q-learning tasks, with ablations showing the benefits of the interval-based structure over regularization alone. The work contributes theoretical guarantees, practical algorithms, and a versatile embedding framework that can enhance planning, causal learning, and representation learning in asymmetric geometric settings.

Abstract

Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives and are gaining more attention in machine learning applications. Imposing such quasimetric structures in model representations has been shown to improve many tasks, including reinforcement learning (RL) and causal relation learning. In this work, we present four desirable properties in such quasimetric models, and show how prior works fail at them. We propose Interval Quasimetric Embedding (IQE), which is designed to satisfy all four criteria. On three quasimetric learning experiments, IQEs show strong approximation and generalization abilities, leading to better performance and improved efficiency over prior methods. Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding Quasimetric Learning Code Package: https://www.github.com/quasimetric-learning/torch-quasimetric

Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings

TL;DR

Abstract

Paper Structure (47 sections, 3 theorems, 10 equations, 6 figures, 2 tables)

This paper contains 47 sections, 3 theorems, 10 equations, 6 figures, 2 tables.

Introduction
Latent Structures
Interval Quasimetric Embeddings (IQE)
IQE Components.
Combining IQE Components.
Theoretical Results on Universal Approximation
Relation with PQE.
Relation with MRN.
Deep Norm and Wide Norm.
Related Works
Latent and Representation Learning.
Experiments
Models and A Triangle Inequality Regularizer.
Large-Scale Social Graph
...and 32 more sections

Key Result

Theorem 2

For any finite space $(\mathcal{X}, d)$ with $\size{\mathcal{X}} = n < \infty$, there exists encoders $f_1, f_2$ such that $(f_1, d_\mathsf{IQE\hbox{-}maxmean})$ exactly represents $d$, and $(f_2, d_\mathsf{IQE\hbox{-}sum})$ approximates $d$ with distortion $\mathcal{O}(t \log^2 n)$, where $t$ is a

Figures (6)

Figure 1: Different latent $d_\mathsf{latent}$. Plots show how predicted distances (and components forming them) change as two latent vectors move apart. Red bars show the number of trainable parameters in $d_\mathsf{latent}$. (a) PQE suffers from diminishing gradients. (b,c) Deep Norm and Wide Norm require expensive latent head, and have complex relations between latents and predictions (due to its learned concave transformations). (d) IQE uses a simple head and does not suffer from gradient optimization issues. (a-d) Plots are computed at random initializations, with Deep Norm and Wide concave transformation parameters scaled to emphasize the non-linearity.
Figure 2: Computing IQE from latent ${\color[RGB]{17,142,255}u} \in \R^{2\times 3}$ to latent ${\color[RGB]{242,114,0} v} \in \R^{2\times 3}$.
Figure 3: Effect of different $(k,l)$ choices for IQEs and PQEs with fixed total latent dimension $=512$.
Figure 4: Modeling graphs of different structures. Deep Norm, Wide Norm and MRN use latent head with $12{,}500$ more parameters than IQEs ($\leq 1$ parameter) and PQEs. The much simpler IQEs are comparable or better than them, and outperform all other methods.
Figure 5: Offline goal-conditioned Q-learning results on a simple grid-world with four directional actions. Using different goal-conditioned Q-function models leads to different inductive biases and planning success rates. We use one-step greedy planning learned Q-function.
...and 1 more figures

Theorems & Definitions (4)

Definition 1: Quasimetric
Theorem 2: IQE Universal Approximation; Finite Case
Theorem 3: IQE Universal Approximation; General Case
Theorem 4: Deep Norm and Wide Norm Universal Approximation

Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings

TL;DR

Abstract

Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)