Urban Boundary Delineation from Commuting Data with Bayesian Stochastic Blockmodeling: Scale, Contiguity, and Hierarchy
Sebastian Morel-Balbi, Alec Kirkley
TL;DR
The paper analyzes urban boundary delineation from commuting data using stochastic block models (SBMs) and the minimum description length (MDL) principle to achieve principled network partitioning without tunable parameters. It compares microcanonical SBM variants across directed, weighted, and multigraph representations, and introduces a fast greedy agglomerative regionalization to enforce spatial contiguity while preserving compression. Results show weighted SBMs, especially nested variants, yield strong data compression across scales, but standard SBMs often produce discontiguous regions; the greedy method delivers contiguous partitions with comparable MDL performance. At tract and county levels, the approach reveals scale-dependent trade-offs between contiguity, interpretability, and compression, with weighted models generally outperforming multigraphs and counties capturing substantial but not optimal structure. The work provides practical guidelines for selecting network representations and SBM variants and demonstrates a flexible, scalable tool for data-driven urban boundary delineation with broad applicability to mobility networks.
Abstract
A common method for delineating urban and suburban boundaries is to identify clusters of spatial units that are highly interconnected in a network of commuting flows, each cluster signaling a cohesive economic submarket. It is critical that the clustering methods employed for this task are principled and free of unnecessary tunable parameters to avoid unwanted inductive biases while remaining scalable for high resolution mobility networks. Here we systematically assess the benefits and limitations of a wide array of Stochastic Block Models (SBMs)$\unicode{x2014}$a family of principled, nonparametric models for identifying clusters in networks$\unicode{x2014}$for delineating urban spatial boundaries with commuting data. We find that the data compression capability and relative performance of different SBM variants heavily depends on the spatial extent of the commuting network, its aggregation scale, and the method used for weighting network edges. We also construct a new measure to assess the degree to which community detection algorithms find spatially contiguous partitions, finding that traditional SBMs may produce substantial spatial discontiguities that make them challenging to use in general for urban boundary delineation. We propose a fast nonparametric regionalization algorithm that can alleviate this issue, achieving data compression close to that of unconstrained SBM models while ensuring spatial contiguity, benefiting from a deterministic optimization procedure, and being generalizable to a wide range of community detection objective functions.
