Table of Contents
Fetching ...

State estimation of urban air pollution with statistical, physical, and super-learning graph models

Matthieu Dolbeault, Olga Mula, Agustín Somacal

TL;DR

This work tackles real-time state estimation of urban NO$_2$ concentrations by modeling the city as a metric/quantum graph and fusing heterogeneous data sources (sensor measurements, meteorology, and traffic-derived emissions). It develops and compares a spectrum of reconstruction methods—spatial average, BLUE, kriging, source-emission models, and physics-based elliptic diffusion on graphs—then couples them into an ensemble super-learning framework to improve accuracy. Reduced-order and physics-informed approaches are used to manage computational cost while capturing key spatial dynamics, with validation on Paris data and a leave-one-out cross-validation strategy to mitigate limited sensor coverage. The ensemble method achieves robust performance across stations, highlighting the value of integrating data-driven and physics-driven models for real-time urban pollution mapping and emphasizing data quality and topography as avenues for future gains.

Abstract

We consider the problem of real-time reconstruction of urban air pollution maps. The task is challenging due to the heterogeneous sources of available data, the scarcity of direct measurements, the presence of noise, and the large surfaces that need to be considered. In this work, we introduce different reconstruction methods based on posing the problem on city graphs. Our strategies can be classified as fully data-driven, physics-driven, or hybrid, and we combine them with super-learning models. The performance of the methods is tested in the case of the inner city of Paris, France.

State estimation of urban air pollution with statistical, physical, and super-learning graph models

TL;DR

This work tackles real-time state estimation of urban NO concentrations by modeling the city as a metric/quantum graph and fusing heterogeneous data sources (sensor measurements, meteorology, and traffic-derived emissions). It develops and compares a spectrum of reconstruction methods—spatial average, BLUE, kriging, source-emission models, and physics-based elliptic diffusion on graphs—then couples them into an ensemble super-learning framework to improve accuracy. Reduced-order and physics-informed approaches are used to manage computational cost while capturing key spatial dynamics, with validation on Paris data and a leave-one-out cross-validation strategy to mitigate limited sensor coverage. The ensemble method achieves robust performance across stations, highlighting the value of integrating data-driven and physics-driven models for real-time urban pollution mapping and emphasizing data quality and topography as avenues for future gains.

Abstract

We consider the problem of real-time reconstruction of urban air pollution maps. The task is challenging due to the heterogeneous sources of available data, the scarcity of direct measurements, the presence of noise, and the large surfaces that need to be considered. In this work, we introduce different reconstruction methods based on posing the problem on city graphs. Our strategies can be classified as fully data-driven, physics-driven, or hybrid, and we combine them with super-learning models. The performance of the methods is tested in the case of the inner city of Paris, France.
Paper Structure (26 sections, 46 equations, 6 figures)

This paper contains 26 sections, 46 equations, 6 figures.

Figures (6)

  • Figure 1: Cropped Google Map screenshot of Paris and the $m=13$ available stations in the study: red dots represent the projection of the station locations to the nearest vertex in the graph of streets, while the blue crosses correspond to the exact position of the station.
  • Figure 2: Raw data from Google Maps: the image contains the city with its main landmarks, and some streets are highlighted with one of the four colors corresponding to traffic.
  • Figure 3: The metric graph downloaded from Open Street Maps, with the edges that never had Google Traffic activation in red, and the edges remaining after filtration in yellow
  • Figure 4: Correlation between stations as a function of the distance. The vertical slashed red line marks the maximal separation between vertex and station (165m) which still lays in the zone of high correlation.
  • Figure 5: Root mean square error on tested stations for the different proposed methods
  • ...and 1 more figures

Theorems & Definitions (2)

  • Remark 3.1
  • Remark 3.2