Embedding Probability Distributions into Low Dimensional $\ell_1$: Tree Ising Models via Truncated Metrics

Moses Charikar; Spencer Compton; Chirag Pabbaraju

Embedding Probability Distributions into Low Dimensional $\ell_1$: Tree Ising Models via Truncated Metrics

Moses Charikar, Spencer Compton, Chirag Pabbaraju

TL;DR

The paper investigates when high-dimensional ℓ1 distances can be embedded into low-dimensional ℓ1 spaces, focusing on distance metrics induced by tree-structured binary distributions. By viewing ℓ1 metrics as disagreements under a distribution and introducing a truncation framework, the authors show that distances arising from Tree Ising Models can be embedded into ℓ1 with constant distortion in polylogarithmic dimensions, via both fixed-cap and Lipschitz-cap truncated-tree metrics. They develop and combine tools for truncated line and tree metrics, including a novel Build-Clean approach, caterpillar decompositions, and a Bernoulli-randomness framework to handle external fields. Beyond the TIM setting, they prove general truncation results for arbitrary ℓ1 metrics, achieving near-optimal dimension blowups, and discuss lower bounds from treewidth-3 networks, clarifying the boundary between tractable and intractable embedding problems. The results provide a structured pathway for low-dimensional representations of distribution-induced ℓ1 metrics and introduce techniques with potential applications in metric embedding and graphical-model analysis.

Abstract

Given an arbitrary set of high dimensional points in $\ell_1$, there are known negative results that preclude the possibility of always mapping them to a low dimensional $\ell_1$ space while preserving distances with small multiplicative distortion. This is in stark contrast with dimension reduction in Euclidean space ($\ell_2$) where such mappings are always possible. While the first non-trivial lower bounds for $\ell_1$ dimension reduction were established almost 20 years ago, there has been limited progress in understanding what sets of points in $\ell_1$ are conducive to a low-dimensional mapping. In this work, we study a new characterization of $\ell_1$ metrics that are conducive to dimension reduction in $\ell_1$. Our characterization focuses on metrics that are defined by the disagreement of binary variables over a probability distribution -- any $\ell_1$ metric can be represented in this form. We show that, for configurations of $n$ points in $\ell_1$ obtained from tree Ising models, we can reduce dimension to $\mathrm{polylog}(n)$ with constant distortion. In doing so, we develop technical tools for embedding truncated metrics which have been studied because of their applications in computer vision, and are objects of independent interest in metric geometry. Among other tools, we show how any $\ell_1$ metric can be truncated with $O(1)$ distortion and $O(\log(n))$ blowup in dimension.

Embedding Probability Distributions into Low Dimensional $\ell_1$: Tree Ising Models via Truncated Metrics

TL;DR

Abstract

Given an arbitrary set of high dimensional points in

, there are known negative results that preclude the possibility of always mapping them to a low dimensional

space while preserving distances with small multiplicative distortion. This is in stark contrast with dimension reduction in Euclidean space (

) where such mappings are always possible. While the first non-trivial lower bounds for

dimension reduction were established almost 20 years ago, there has been limited progress in understanding what sets of points in

are conducive to a low-dimensional mapping. In this work, we study a new characterization of

metrics that are conducive to dimension reduction in

. Our characterization focuses on metrics that are defined by the disagreement of binary variables over a probability distribution -- any

metric can be represented in this form. We show that, for configurations of

points in

obtained from tree Ising models, we can reduce dimension to

with constant distortion. In doing so, we develop technical tools for embedding truncated metrics which have been studied because of their applications in computer vision, and are objects of independent interest in metric geometry. Among other tools, we show how any

metric can be truncated with

distortion and

blowup in dimension.

Paper Structure (32 sections, 31 theorems, 191 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 32 sections, 31 theorems, 191 equations, 11 figures, 1 table, 2 algorithms.

Introduction
Overview of Results
Embedding Probability Distributions
Truncated Tree Metrics
General Truncated $\ell_1$ Metrics
Preliminaries and Notation
Notation
Tree Ising Models
Symmetric Tree Ising Models
Metric Spaces of Interest
Fixed Cap Metrics
Lipschitz Cap Metrics
Caterpillar Tree Decomposition
Symmetric Tree Ising Models
Symmetric Tree Ising Models Reduce to Fixed Cap Tree Metrics
...and 17 more sections

Key Result

Theorem 1

Any distance metric over tree Ising models (i.e, $d_D(i,j) = c \cdot \operatorname{{Pr}}_{X \sim D}[X_i \ne X_j]$, where $D$ is a tree Ising model distribution on $n$ random variables $X_1,\dots,X_n$) can be embedded into $(\mathds R^{O(\log^2(n))}, \ell_1)$ with $\Theta(1)$ distortion.

Figures (11)

Figure 1: Lazy snaking with fixed cap.
Figure 2: Bernoulli randomness and conditional Bernoulli randomness as "boxed" mass.
Figure 3: The ratio of blue and green mass within $X_j=0$ and $X_j=1$ is preserved when it gets distributed between $X_k=0$ and $X_k=1$. The picture above only shows this for the (purple) mass in state $X_j=1$. However, it is also true for the mass within state $X_j=0$ that would occupy the leftover white regions in $X_k$.
Figure 4: The case where $\operatorname{{Br}}(X_i | X_j) < 2 \cdot b(i), \frac{1}{2}$. Each $X_{i+1},\dots,X_j$ has unmatched blue/green mass in each state, totalling more than $\frac{1}{2}$. The boxed mass in each state has equal green and blue mass. Toggles and crosses are in one-to-one correspondence in this case. In the picture above, we are imagining there to be an odd number of toggles on the path $X_i \to X_j$. Observe how this results in all of the unmatched mass in $X_j$ contributing to $\operatorname{{Pr}}[X_i \neq X_j]$. In particular, for $X_j$ (for which #toggles is odd), $\operatorname{{Pr}}[X_i \neq X_j]=1-\frac{1}{2}\operatorname{{Br}}[X_i|X_j]$, while for $X_{k+1}$ (for which #toggles is even), $\operatorname{{Pr}}[X_i \neq X_{k+1}]=\frac{1}{2}\operatorname{{Br}}[X_i|X_{k+1}]$.
Figure 5: The case where $\operatorname{{Br}}(X_i | X_j) = 2 \cdot b(i).$ The mass corresponding to the smaller state in $X_i$ (green mass) should all be in a box in $X_j$.
...and 6 more figures

Theorems & Definitions (97)

Theorem 1: Embedding Tree Ising Models
Claim 2.0: Treewidth 3 lower bound
Theorem 2: Fixed cap tree into $\ell_1$
Theorem 3: Embedding Lipschitz-Capped Trees
Theorem 4: General Truncated $\ell_1$ Metrics
Lemma 4.1: symmetric tree Ising model with no "bad" edges $\to$ fixed cap tree
proof
Claim 4.2
proof
Claim 4.3
...and 87 more

Embedding Probability Distributions into Low Dimensional $\ell_1$: Tree Ising Models via Truncated Metrics

TL;DR

Abstract

Embedding Probability Distributions into Low Dimensional $\ell_1$: Tree Ising Models via Truncated Metrics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (97)