Table of Contents
Fetching ...

CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing

Zhengfei Zheng, Xu Geng, Hai Yang

TL;DR

CityNet tackles the fragmentation of open urban data by introducing a first-of-its-kind multi-modal, spatio-temporally aligned dataset spanning mobility, geography, and meteorology across seven cities. By benchmarking spatio-temporal prediction, transfer learning, and reinforcement learning tasks on CityNet, the paper demonstrates meaningful inter-city correlations and cross-task knowledge transfer. The results establish CityNet as a versatile benchmark for urban computing and reveal actionable insights into how context data such as POIs and weather relate to service data like taxi flows and speeds. The dataset offers avenues for transfer learning, federated learning, and explainable urban analytics, with potential to inform smart city decision-making.

Abstract

Data-driven approaches have emerged as a popular tool for addressing challenges in urban computing. However, current research efforts have primarily focused on limited data sources, which fail to capture the complexity of urban data arising from multiple entities and their interconnections. Therefore, a comprehensive and multifaceted dataset is required to enable more extensive studies in urban computing. In this paper, we present CityNet, a multi-modal urban dataset that incorporates various data, including taxi trajectory, traffic speed, point of interest (POI), road network, wind, rain, temperature, and more, from seven cities. We categorize this comprehensive data into three streams: mobility data, geographical data, and meteorological data. We begin by detailing the generation process and basic properties of CityNet. Additionally, we conduct extensive data mining and machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning, to facilitate the use of CityNet. Our experimental results provide benchmarks for various tasks and methods, and also reveal internal correlations among cities and tasks within CityNet that can be leveraged to improve spatiotemporal forecasting performance. Based on our benchmarking results and the correlations uncovered, we believe that CityNet can significantly contribute to the field of urban computing by enabling research on advanced topics.

CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing

TL;DR

CityNet tackles the fragmentation of open urban data by introducing a first-of-its-kind multi-modal, spatio-temporally aligned dataset spanning mobility, geography, and meteorology across seven cities. By benchmarking spatio-temporal prediction, transfer learning, and reinforcement learning tasks on CityNet, the paper demonstrates meaningful inter-city correlations and cross-task knowledge transfer. The results establish CityNet as a versatile benchmark for urban computing and reveal actionable insights into how context data such as POIs and weather relate to service data like taxi flows and speeds. The dataset offers avenues for transfer learning, federated learning, and explainable urban analytics, with potential to inform smart city decision-making.

Abstract

Data-driven approaches have emerged as a popular tool for addressing challenges in urban computing. However, current research efforts have primarily focused on limited data sources, which fail to capture the complexity of urban data arising from multiple entities and their interconnections. Therefore, a comprehensive and multifaceted dataset is required to enable more extensive studies in urban computing. In this paper, we present CityNet, a multi-modal urban dataset that incorporates various data, including taxi trajectory, traffic speed, point of interest (POI), road network, wind, rain, temperature, and more, from seven cities. We categorize this comprehensive data into three streams: mobility data, geographical data, and meteorological data. We begin by detailing the generation process and basic properties of CityNet. Additionally, we conduct extensive data mining and machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning, to facilitate the use of CityNet. Our experimental results provide benchmarks for various tasks and methods, and also reveal internal correlations among cities and tasks within CityNet that can be leveraged to improve spatiotemporal forecasting performance. Based on our benchmarking results and the correlations uncovered, we believe that CityNet can significantly contribute to the field of urban computing by enabling research on advanced topics.

Paper Structure

This paper contains 33 sections, 12 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Architecture of CityNet.Left: Three raw data sources of CityNet. Middle: Schematic description of all 8 sub-datasets, whose sources are distinguished by color as shown in Fig. 1(a) and 1(b). Right: Decomposition of the data dimensions into cities and tasks. Directed curves indicate correlations to be discovered in this paper.
  • Figure 2: The average daily mobility pattern of taxi data for (a) Beijing, (b) Chengdu and Xi'an. To obtain these patterns, we aggregated all values at each timestamp from all days and represented the mean values at each timestamp using solid lines, while the standard deviations were represented using shades.
  • Figure 3: The daily average speed in Xi'an with and without rain or fog and the shades in the figure represent half of the standard deviation.

Theorems & Definitions (6)

  • Definition 1: Taxi GPS Points
  • Definition 2: POI
  • Definition 3: Road segments and real-time speed
  • Definition 4: Region
  • Definition 5: Timestamps
  • Definition 6: Spatio-temporal Tensors