Structure-aware divergences for comparing probability distributions

Rohit Sahasrabuddhe; Renaud Lambiotte

Structure-aware divergences for comparing probability distributions

Rohit Sahasrabuddhe, Renaud Lambiotte

Abstract

Many natural and social science systems are described using probability distributions over elements that are related to each other: for instance, occupations with shared skills or species with similar traits. Standard information theory quantities such as entropies and $f$-divergences treat elements interchangeably and are blind to the similarity structure. We introduce a family of divergences that are sensitive to the geometry of the underlying domain. By virtue of being the Bregman divergences of structure-aware entropies, they provide a framework that retains several advantages of Kullback-Leibler divergence and Shannon entropy. Structure-aware divergences recover planted patterns in a synthetic clustering task that conventional divergences miss and are orders of magnitude faster than optimal transport distances. We demonstrate their applicability in economic geography and ecology, where structure plays an important role. Modelling different notions of occupation relatedness yields qualitatively different regionalisations of their geographic distribution. Our methods also reproduce established insights into functional $β$-diversity in ecology obtained with optimal transport methods.

Structure-aware divergences for comparing probability distributions

Abstract

-divergences treat elements interchangeably and are blind to the similarity structure. We introduce a family of divergences that are sensitive to the geometry of the underlying domain. By virtue of being the Bregman divergences of structure-aware entropies, they provide a framework that retains several advantages of Kullback-Leibler divergence and Shannon entropy. Structure-aware divergences recover planted patterns in a synthetic clustering task that conventional divergences miss and are orders of magnitude faster than optimal transport distances. We demonstrate their applicability in economic geography and ecology, where structure plays an important role. Modelling different notions of occupation relatedness yields qualitatively different regionalisations of their geographic distribution. Our methods also reproduce established insights into functional

-diversity in ecology obtained with optimal transport methods.

Paper Structure (45 sections, 7 theorems, 23 equations, 11 figures, 3 tables)

This paper contains 45 sections, 7 theorems, 23 equations, 11 figures, 3 tables.

Introduction
Results
Preliminaries
Structure-aware entropy
Structure-aware divergence
Positive definite similarity matrices
Synthetic experiments
Recovering planted partitions
Runtime compared to Optimal Transport
The geography of occupations in England and Wales
Regionalisation
$\beta$-diversity of vegetation in the Rutor glacier
Discussion
Methods
Structure-aware entropy
...and 30 more sections

Key Result

Theorem 1

$\mathcal{H}^{\mathbf{Z}}_{\alpha}$ is strictly concave in $\Delta_n^\circ$ for $\alpha \geq 2$ if $\mathbf{Z} \succ 0$.

Figures (11)

Figure 1: Illustrative example.$\mathbf{Z}$ is a similarity matrix representing a two-level hierarchy where four elements group into 'pairs' (top left). We consider three distributions over them: one where both elements of a pair have high probability (blue), one where one element each from the pairs have high probability (orange), and the uniform distribution (green; top right). The orange distribution has a higher entropy than the blue one since its high probability elements are dissimilar (bottom left; note the log scale for $\alpha$). Similarly, the orange distribution diverges less from the uniform distribution than the blue one does (bottom right; note that divergence is only defined for $\alpha\geq2$). Entropies and divergences blind to structure would not differentiate between the two.
Figure 2: Synthetic experiments.A. Planted patterns. (Top) The embedding in $\mathbb{R}^2$, with ground truth groups represented as different shapes. We show a typical sample with $m=4$, with each row corresponding to a distribution. (Bottom left) The clustering quality as a function of $k$ for $\mathbf{I}$ (blue) and $\mathbf{Z}$ (orange). The points mark the results from 50 runs and the lines connected the median values for $m=2$ (dashed) and $m=16$ (solid). (Bottom right) The alignment with ground truth, measured using Adjusted Mutual Information (AMI), against $m$. We report the median and 95% confidence intervals for $k=2$ (dotted) and $k=3$ (dot-dashed). B. Runtime. The runtime in seconds as a function of the number of input distributions for the all-pairs dissimilarity task. The points mark the results from 50 runs each for OT (blue), J-BD (naive) (orange), and J-BD (fast) (green). The lines connect the medians. The J-BD methods use $\alpha=2$. Note the log scale on both axes.
Figure 3: Statistics for regionalisation by occupational composition. (Left) Clustering quality as a function of the number of clusters $k$ for occupation similarity matrices modelling no similarity (blue), skills similarity (orange), and co-location similarity (green). The clustering quality of a partition is the fraction of total Bregman information explained by it. (Right) The AMI between the optimal partitions for each $k$ for each pair of similarity matrices.
Figure 4: Three-way partitions by occupational composition. (Left) The geography of the optimal partitions into three regions using no occupation similarity (blue), skills similarity (orange), and co-location similarity (green). We distinguish the regions by intensity of colour (light, medium, and dark). (Right) The occupational composition of each cluster for skills (top row) and co-location (bottom row) similarity. Each point is an occupation, with size corresponding to employment share in the total population. We label a few distinctive occupations (see end of caption for abbreviations). Their positions are set by Multi-Dimensional Scaling (MDS) of the similarity structure -- similar occupations are closer. Each plot corresponds to the region indicated by the stripe on its left margin. The colour of an occupation represents the Revealed Comparative Advantage (RCA) of the corresponding region in it. Blue and red indicate over and under-representation respectively. Abbreviations: Exec. is Chief Executives and Senior Officials, Fin. is Finance Professionals, Mach. is Plant and Machine Operatives, Stor. is Elementary Storage Occupations, and Agri. is Agricultural and Related Trades.
Figure 5: $\beta$-diversity of the successional stages of the Rutor glacier. We plot the taxonomic (left) and functional (right) $\beta$-diversity of the three successional stages (rows). We measure the $\beta$-diversity of a stage as its Bregman information. The distributions are the $\beta$-diversities of 1,000 synthetic stages with randomly sampled plots. The dashed black lines mark the empirical values. These results are for $\alpha=2$.
...and 6 more figures

Theorems & Definitions (18)

Definition 1
Definition 2
Definition 3
Definition 4
Theorem 1
Definition 5
Definition 6
Proposition 1
Proposition 2
Lemma 1
...and 8 more

Structure-aware divergences for comparing probability distributions

Abstract

Structure-aware divergences for comparing probability distributions

Authors

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (18)