Table of Contents
Fetching ...

A Large-scale Benchmark Dataset for Commuting Origin-destination Matrix Generation

Can Rong, Jingtao Ding, Yan Liu, Yong Li

TL;DR

The paper addresses the challenge of generating reliable commuting origin-destination matrices for regions lacking historical data by introducing a large-scale benchmark dataset covering 3,233 diverse US areas with OD matrices and rich regional attributes. It proposes a new paradigm that treats each area as an attributed directed weighted graph and reframes OD generation as conditional graph edge generation, implemented via a diffusion-based model called WeDAN. Benchmark results show matrix-wise generative models, particularly WeDAN, achieve state-of-the-art performance and better generalization across diverse areas, suggesting a graph-learning approach can capture global mobility patterns beyond single-city transfer. This work enables more generalizable commuting OD generation, with practical impact on urban planning and transportation research by providing a scalable, privacy-preserving data resource and a powerful modeling framework for heterogeneous regions.

Abstract

The commuting origin-destination~(OD) matrix is a critical input for urban planning and transportation, providing crucial information about the population residing in one region and working in another within an interested area. Despite its importance, obtaining and updating the matrix is challenging due to high costs and privacy concerns. This has spurred research into generating commuting OD matrices for areas lacking historical data, utilizing readily available information via computational models. In this regard, existing research is primarily restricted to only a single or few large cities, preventing these models from being applied effectively in other areas with distinct characteristics, particularly in towns and rural areas where such data is urgently needed. To address this, we propose a large-scale dataset comprising commuting OD matrices for 3,233 diverse areas around the U.S. For each area, we provide the commuting OD matrix, combined with regional attributes including demographics and point-of-interests of each region in that area. We believe this comprehensive dataset will facilitate the development of more generalizable commuting OD matrix generation models, which can capture various patterns of distinct areas. Additionally, we use this dataset to benchmark a set of commuting OD generation models, including physical models, element-wise predictive models, and matrix-wise generative models. Surprisingly, we find a new paradigm, which considers the whole area combined with its commuting OD matrix as an attributed directed weighted graph and generates the weighted edges based on the node attributes, can achieve the optimal. This may inspire a new research direction from graph learning in this field.

A Large-scale Benchmark Dataset for Commuting Origin-destination Matrix Generation

TL;DR

The paper addresses the challenge of generating reliable commuting origin-destination matrices for regions lacking historical data by introducing a large-scale benchmark dataset covering 3,233 diverse US areas with OD matrices and rich regional attributes. It proposes a new paradigm that treats each area as an attributed directed weighted graph and reframes OD generation as conditional graph edge generation, implemented via a diffusion-based model called WeDAN. Benchmark results show matrix-wise generative models, particularly WeDAN, achieve state-of-the-art performance and better generalization across diverse areas, suggesting a graph-learning approach can capture global mobility patterns beyond single-city transfer. This work enables more generalizable commuting OD generation, with practical impact on urban planning and transportation research by providing a scalable, privacy-preserving data resource and a powerful modeling framework for heterogeneous regions.

Abstract

The commuting origin-destination~(OD) matrix is a critical input for urban planning and transportation, providing crucial information about the population residing in one region and working in another within an interested area. Despite its importance, obtaining and updating the matrix is challenging due to high costs and privacy concerns. This has spurred research into generating commuting OD matrices for areas lacking historical data, utilizing readily available information via computational models. In this regard, existing research is primarily restricted to only a single or few large cities, preventing these models from being applied effectively in other areas with distinct characteristics, particularly in towns and rural areas where such data is urgently needed. To address this, we propose a large-scale dataset comprising commuting OD matrices for 3,233 diverse areas around the U.S. For each area, we provide the commuting OD matrix, combined with regional attributes including demographics and point-of-interests of each region in that area. We believe this comprehensive dataset will facilitate the development of more generalizable commuting OD matrix generation models, which can capture various patterns of distinct areas. Additionally, we use this dataset to benchmark a set of commuting OD generation models, including physical models, element-wise predictive models, and matrix-wise generative models. Surprisingly, we find a new paradigm, which considers the whole area combined with its commuting OD matrix as an attributed directed weighted graph and generates the weighted edges based on the node attributes, can achieve the optimal. This may inspire a new research direction from graph learning in this field.
Paper Structure (26 sections, 10 equations, 9 figures, 2 tables, 2 algorithms)

This paper contains 26 sections, 10 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: A comparison of traditional transfer paradigm and our novel generative paradigm for origin-destination matrix generation.
  • Figure 2: Statistical analysis of the large-scale dataset of 3,233 areas in the United States, including the distribution of a) the number of regions in each area, b) the average trip distance in each area, c) the variance of the in/out flow of each region in each area.
  • Figure 3: Visualization of the OD matrices of three areas with different mobility structure, a) monocentric (Maricopa in Arizona), b) polycentric (Alameda in California), and c) smoothly distributed (Contra Costa in California).
  • Figure 4: Distributions of OD flows and outflows in areas of different scales. a) cumulative distribution function of edge weights, and b) probabilistic density function at log scale of node degrees.
  • Figure 5: An example of construction of an attributed directed weighted graph formated by the spatial characteristics and OD matrix of the corresponding area consisting of 5 regions.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4