Probing the Information Theoretical Roots of Spatial Dependence Measures

Zhangyu Wang; Krzysztof Janowicz; Gengchen Mai; Ivan Majic

Probing the Information Theoretical Roots of Spatial Dependence Measures

Zhangyu Wang, Krzysztof Janowicz, Gengchen Mai, Ivan Majic

TL;DR

Probing the Information Theoretical Roots of Spatial Dependence Measures tackles the problem of connecting spatial autocorrelation with information theory by deriving a formal self-information–based interpretation of Moran's I. The authors decompose $\bar{I}$ into a weighted sum of counts $|S_{p,q}|$ and show that, under mild randomness assumptions, these counts follow binomial or Poisson-binomial distributions and can be approximated by a normal distribution. They provide analytical expressions for the approximate mean $\tilde{\mu}_{\bar{I}}$ and variance $\tilde{\sigma}_{\bar{I}}^{2}$ conditioned on the value scheme $T_M$, along with correction techniques to maintain accuracy under common relaxations. Synthetic and real-data experiments (including EU slope patches) demonstrate robustness and practical utility, enabling computation of spatial self-information $J$ that complements traditional Moran-type measures.

Abstract

Intuitively, there is a relation between measures of spatial dependence and information theoretical measures of entropy. For instance, we can provide an intuition of why spatial data is special by stating that, on average, spatial data samples contain less than expected information. Similarly, spatial data, e.g., remotely sensed imagery, that is easy to compress is also likely to show significant spatial autocorrelation. Formulating our (highly specific) core concepts of spatial information theory in the widely used language of information theory opens new perspectives on their differences and similarities and also fosters cross-disciplinary collaboration, e.g., with the broader AI/ML communities. Interestingly, however, this intuitive relation is challenging to formalize and generalize, leading prior work to rely mostly on experimental results, e.g., for describing landscape patterns. In this work, we will explore the information theoretical roots of spatial autocorrelation, more specifically Moran's I, through the lens of self-information (also known as surprisal) and provide both formal proofs and experiments.

Probing the Information Theoretical Roots of Spatial Dependence Measures

TL;DR

into a weighted sum of counts

and show that, under mild randomness assumptions, these counts follow binomial or Poisson-binomial distributions and can be approximated by a normal distribution. They provide analytical expressions for the approximate mean

and variance

conditioned on the value scheme

, along with correction techniques to maintain accuracy under common relaxations. Synthetic and real-data experiments (including EU slope patches) demonstrate robustness and practical utility, enabling computation of spatial self-information

that complements traditional Moran-type measures.

Abstract

Paper Structure (15 sections, 7 theorems, 5 equations, 6 figures)

This paper contains 15 sections, 7 theorems, 5 equations, 6 figures.

Introduction
Motivation and Related Works
Method
Problem Setup
Rearrangement of $\Bar{I}$
Asymptotic Binomial and Poisson Binomial Distributions
Normal Approximation of Binomial and Poisson Binomial Distributions
Analytical Approximation of the Distribution of $\Bar{I}$
Analysis of Approximation Accuracy and Robustness on Synthetic Data
Relaxation of Condition 3: Violation of Approximate Independence
Relaxation of Assumption 3: Different Numbers of Neighbors
Relaxation of Assumption 4: Common Neighbors
Trivial Relaxations
Applications on Real-World Data
Conclusions and Future Work

Key Result

Lemma 3

$\Bar{I}$ is a weighted sum of the cardinality of all possible sets of $pq$-pairs. Specifically, $\Bar{I} = \sum_{p,q} (c_p - \bar{x})(c_q - \bar{x}) |S_{p,q}|$.

Figures (6)

Figure 1: Histograms of $\Bar{I}$ of 10,000 randomly generated $40\times40$ grids using rook's distance. From (a) to (c) the proportion of background $b$ decreases, i.e., the level of independence decreases. The blue lines represent the estimated normal distributions from the histograms of $\Bar{I}$. The red lines represent the analytical approximations based on \ref{['theorem:approx-mean-theorem']} and \ref{['theorem:approx-variance-theorem']}.
Figure 2: The relation between the approximation accuracy and the level of independence (measured by $b= n_{r_{\max}} / N$, the proportion of background values). At each level of independence, we repeatedly sample 10,000 $40\times40$ grids randomly for 10 times. (a) and (b) plot the standardized difference between the analytical mean/standard deviation and the empirical mean/standard deviation. (c) plots the KL divergence from the analytical approximation to the empirical distribution.
Figure 3: (a) The relation between the approximation accuracy and the level of perturbation in the number of neighbors, measured by the rate of perturbation. (b) The relation between the approximation accuracy and the level of change in total number of neighbors, measured by the change rate $\Delta_N/kN$.
Figure 4: (a) The uncorrected analytical $|S_{p,p}|$ is constantly underestimated, while the corrected analytical $|S_{p,p}|$ approximates the empirical $|\hat{S}_{p,p}|$ better. (b) The relation between the KL-divergence and $n_p$. The larger the $n_p$, the more common neighbors, and the worse approximation accuracy.
Figure 5: Slope patches in ascending order of Moran's I from left to right and top to bottom.
...and 1 more figures

Theorems & Definitions (9)

Definition 1
Definition 2
Lemma 3: Rearrangement of $\Bar{I}$ as Cardinality of Sets
Lemma 4: Probability of the Cardinality of Same-Value Sets
Lemma 5: Probability of the Cardinality of Different-Value Sets
Lemma 6: Normal Approximation for Different-Value Sets
Lemma 7: Normal Approximation for Same-Value Sets
Theorem 8: Approximate Mean of $\Bar{I}$
Theorem 9: Approximate Variance of $\Bar{I}$

Probing the Information Theoretical Roots of Spatial Dependence Measures

TL;DR

Abstract

Probing the Information Theoretical Roots of Spatial Dependence Measures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)