Table of Contents
Fetching ...

TopoFlow: Physics-guided Neural Networks for high-resolution air quality prediction

Ammar Kheder, Helmi Toropainen, Wenqing Peng, Samuel Antão, Jia Chen, Zhi-Song Liu, Michael Boy

TL;DR

TopoFlow achieves performance gains that are consistent across all four major pollutants and forecast lead times from 12 to 96 hours, demonstrating that principled integration of physical knowledge into neural networks can fundamentally advance air quality prediction.

Abstract

We propose TopoFlow (Topography-aware pollutant Flow learning), a physics-guided neural network for efficient, high-resolution air quality prediction. To explicitly embed physical processes into the learning framework, we identify two critical factors governing pollutant dynamics: topography and wind direction. Complex terrain can channel, block, and trap pollutants, while wind acts as a primary driver of their transport and dispersion. Building on these insights, TopoFlow leverages a vision transformer architecture with two novel mechanisms: topography-aware attention, which explicitly models terrain-induced flow patterns, and wind-guided patch reordering, which aligns spatial representations with prevailing wind directions. Trained on six years of high-resolution reanalysis data assimilating observations from over 1,400 surface monitoring stations across China, TopoFlow achieves a PM2.5 RMSE of 9.71 ug/m3, representing a 71-80% improvement over operational forecasting systems and a 13% improvement over state-of-the-art AI baselines. Forecast errors remain well below China's 24-hour air quality threshold of 75 ug/m3 (GB 3095-2012), enabling reliable discrimination between clean and polluted conditions. These performance gains are consistent across all four major pollutants and forecast lead times from 12 to 96 hours, demonstrating that principled integration of physical knowledge into neural networks can fundamentally advance air quality prediction.

TopoFlow: Physics-guided Neural Networks for high-resolution air quality prediction

TL;DR

TopoFlow achieves performance gains that are consistent across all four major pollutants and forecast lead times from 12 to 96 hours, demonstrating that principled integration of physical knowledge into neural networks can fundamentally advance air quality prediction.

Abstract

We propose TopoFlow (Topography-aware pollutant Flow learning), a physics-guided neural network for efficient, high-resolution air quality prediction. To explicitly embed physical processes into the learning framework, we identify two critical factors governing pollutant dynamics: topography and wind direction. Complex terrain can channel, block, and trap pollutants, while wind acts as a primary driver of their transport and dispersion. Building on these insights, TopoFlow leverages a vision transformer architecture with two novel mechanisms: topography-aware attention, which explicitly models terrain-induced flow patterns, and wind-guided patch reordering, which aligns spatial representations with prevailing wind directions. Trained on six years of high-resolution reanalysis data assimilating observations from over 1,400 surface monitoring stations across China, TopoFlow achieves a PM2.5 RMSE of 9.71 ug/m3, representing a 71-80% improvement over operational forecasting systems and a 13% improvement over state-of-the-art AI baselines. Forecast errors remain well below China's 24-hour air quality threshold of 75 ug/m3 (GB 3095-2012), enabling reliable discrimination between clean and polluted conditions. These performance gains are consistent across all four major pollutants and forecast lead times from 12 to 96 hours, demonstrating that principled integration of physical knowledge into neural networks can fundamentally advance air quality prediction.
Paper Structure (29 sections, 13 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 29 sections, 13 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: TopoFlow architecture for physics-guided air quality prediction. The model takes as input concentrations of six air pollutants, major meteorological data, population density, spatial coordinates, time stamps, and a topographic map, and outputs pollutant concentrations at lead times from 12 to 96 hours. All input data are stacked into a multi-layer 2D map, then cropped into non-overlapping patches. TopoFlow shuffles patch order based on the wind field within each sector, then processes patches through a Swin Transformer backbone. The topographic map introduces attention bias for topography-aware feature representation.
  • Figure 2: Overall performance of air pollution prediction.(a), Ground truth (CAQRA reanalysis) PM$_{2.5}$ observations. (b), TopoFlow PM$_{2.5}$ predictions. (c), Prediction error ($|\hat{y} - y|$, where $\hat{y}$ is the model prediction and $y$ the CAQRA reanalysis) distribution across lead times comparing TopoFlow, ClimaX, and AirCast. Box plots indicate median (middle line), 25th and 75th percentile (box), and 5th and 95th percentile (whiskers). (d), Spatial distribution of TopoFlow bias in Sichuan (with complex terrains), showing underestimation (blue) in the basin interior and overestimation (red) near elevated margins, consistent with residual difficulty in resolving sharp terrain-induced concentration gradients at the plateau-basin interface.
  • Figure 3: Forecast skill as a function of lead time for six air pollutants. RMSE validated against OpenAQ stations across China for 2019. a, PM$_{2.5}$. b, PM$_{10}$. c, NO$_2$. d, SO$_2$. e, CO. f, O$_3$. TopoFlow (green) achieves the lowest errors for particulate matter and NO$_2$. Aurora (purple) shows superior performance for O$_3$ and CO, which require three-dimensional atmospheric representation to capture stratospheric intrusions and vertical transport.
  • Figure 4: Seasonal PM$_{2.5}$ distribution from forecasts, reanalysis, and observations.(a--d), CAMS forecasts. (e--h), Aurora predictions. (i--l), CAQRA reanalysis. (m--p), TopoFlow predictions. (q--t), OpenAQ measurements. Columns: Winter (15 January 2019), Spring (1 March 2019), Summer (12 July 2019), Autumn (19 October 2019). TopoFlow achieves lowest RMSE (25.9 $\mu$g/m$^3$) against independent stations, outperforming CAQRA (37.0 $\mu$g/m$^3$), CAMS (44.1 $\mu$g/m$^3$), and Aurora (49.8 $\mu$g/m$^3$). Relative to China's 24-hour PM$_{2.5}$ threshold of 75 $\mu$g/m$^3$ (GB 3095-2012) GB3095_2012, only TopoFlow and CAQRA maintain errors below 50% of the regulatory limit.
  • Figure 5: Topographic blocking in Sichuan Basin.(a), CAQRA (ground truth) PM$_{2.5}$ distribution across China at 7 July 2018, 20:00 UTC, with the study region marked by the green box. (b), CAQRA (ground truth) PM$_{2.5}$ concentrations and wind vectors at forecast time (8 July 2018, 08:00 UTC) within the Sichuan Basin. The green line indicates 30.0°N transect. (c), TopoFlow 12-hour prediction achieving spatial correlation r = 0.90. (d), West-east transect along 30.0°N comparing CAQRA (black), TopoFlow (red), AirCast (orange dashed), and ClimaX (blue dotted) against elevation profile (gray shading). Inset table reports the terrain-induced concentration gradient $\Delta = C_{\text{basin,max}} - C_{\text{plateau,min}}$ for each model. (e), Schematic of topographic blocking mechanism: the Tibetan Plateau blocks westerly winds, trapping pollutants within the basin.
  • ...and 4 more figures