Table of Contents
Fetching ...

Scalable Analysis of Urban Scaling Laws: Leveraging Cloud Computing to Analyze 21,280 Global Cities

Zhenhui Li, Hongwei Zhang, Kan Wu

TL;DR

This paper tackles the challenge of validating urban scaling laws across 21,280 globally diverse cities using large-scale, cloud-native geospatial processing. It introduces a cloud-based platform built on ODPS that performs vector and raster polygon queries over hundreds of millions to billions of geospatial data points, drastically reducing computation time from days to minutes. The study finds that scaling exponents vary across development status and indicators, with built-up surface showing the strongest relation to population, while road length and nighttime light exhibit weaker, non-universal fits; removing low-GDP outliers aligns nighttime light with literature and highlights regional data quality effects. The work demonstrates the practical and scientific value of cloud-based big geospatial data analysis for global city science, offering a scalable framework applicable to a wide range of data-intensive, geography-informed research.

Abstract

Cities play a pivotal role in human development and sustainability, yet studying them presents significant challenges due to the vast scale and complexity of spatial-temporal data. One such challenge is the need to uncover universal urban patterns, such as the urban scaling law, across thousands of cities worldwide. In this study, we propose a novel large-scale geospatial data processing system that enables city analysis on an unprecedented scale. We demonstrate the system's capabilities by revisiting the urban scaling law across 21,280 cities globally, using a range of open-source datasets including road networks, nighttime light intensity, built-up areas, and population statistics. Analyzing the characteristics of 21,280 cities involves querying over half a billion geospatial data points, a task that traditional Geographic Information Systems (GIS) would take several days to process. In contrast, our cloud-based system accelerates the analysis, reducing processing time to just minutes while significantly lowering resource consumption. Our findings reveal that the urban scaling law varies across cities in under-developed, developing, and developed regions, extending the insights gained from previous studies focused on hundreds of cities. This underscores the critical importance of cloud-based big data processing for efficient, large-scale geospatial analysis. As the availability of satellite imagery and other global datasets continues to grow, the potential for scientific discovery expands exponentially. Our approach not only demonstrates how such large-scale tasks can be executed efficiently but also offers a powerful solution for data scientists and researchers working in the fields of city and geospatial science.

Scalable Analysis of Urban Scaling Laws: Leveraging Cloud Computing to Analyze 21,280 Global Cities

TL;DR

This paper tackles the challenge of validating urban scaling laws across 21,280 globally diverse cities using large-scale, cloud-native geospatial processing. It introduces a cloud-based platform built on ODPS that performs vector and raster polygon queries over hundreds of millions to billions of geospatial data points, drastically reducing computation time from days to minutes. The study finds that scaling exponents vary across development status and indicators, with built-up surface showing the strongest relation to population, while road length and nighttime light exhibit weaker, non-universal fits; removing low-GDP outliers aligns nighttime light with literature and highlights regional data quality effects. The work demonstrates the practical and scientific value of cloud-based big geospatial data analysis for global city science, offering a scalable framework applicable to a wide range of data-intensive, geography-informed research.

Abstract

Cities play a pivotal role in human development and sustainability, yet studying them presents significant challenges due to the vast scale and complexity of spatial-temporal data. One such challenge is the need to uncover universal urban patterns, such as the urban scaling law, across thousands of cities worldwide. In this study, we propose a novel large-scale geospatial data processing system that enables city analysis on an unprecedented scale. We demonstrate the system's capabilities by revisiting the urban scaling law across 21,280 cities globally, using a range of open-source datasets including road networks, nighttime light intensity, built-up areas, and population statistics. Analyzing the characteristics of 21,280 cities involves querying over half a billion geospatial data points, a task that traditional Geographic Information Systems (GIS) would take several days to process. In contrast, our cloud-based system accelerates the analysis, reducing processing time to just minutes while significantly lowering resource consumption. Our findings reveal that the urban scaling law varies across cities in under-developed, developing, and developed regions, extending the insights gained from previous studies focused on hundreds of cities. This underscores the critical importance of cloud-based big data processing for efficient, large-scale geospatial analysis. As the availability of satellite imagery and other global datasets continues to grow, the potential for scientific discovery expands exponentially. Our approach not only demonstrates how such large-scale tasks can be executed efficiently but also offers a powerful solution for data scientists and researchers working in the fields of city and geospatial science.

Paper Structure

This paper contains 18 sections, 3 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: Boundary of Oklahoma city, USA in vector representation (top) and raster representation (bottom).
  • Figure 2: Locations of 21,280 cities in our study.
  • Figure 3: Distribution of road network nodes from OSM.
  • Figure 4: Scaling exponents ($\beta$ ) for exampled city properties. $R^2$ of road length and nighttime light indicates weak correlations. Built-up surface area shows stronger correlation and is the only one with a $\beta$ close to the ones reported in literature.
  • Figure 5: Running time for each of the 100 cities using MySQL in vector representation.
  • ...and 2 more figures