Table of Contents
Fetching ...

How many stations are sufficient? Exploring the effect of urban weather station density reduction on imputation accuracy of air temperature and humidity

Marvin Plein, Carsten F. Dormann, Andreas Christen

TL;DR

The paper addresses how dense urban weather station networks can be thinned without severely compromising imputation of $T_a$ and $RH$ by applying a data-driven stepwise station removal using Freiburg's 42-station WSN. It trains Extreme Gradient Boosting models in a backward-elimination framework, validated with 10-fold CV, and compares against GLM and SUEWS baselines. Results show substantial thinning is possible (e.g., 42→7–4 stations) with modest RMSE increases (approximately 0.13 K in $T_a$ and 0.29% in $RH$ at mid-density reductions), though remote forest stations remain challenging and edge locations carry most information for city-wide patterns. The study provides practical guidance for cost-effective urban climate monitoring, highlighting the value of retaining 10–25% of strategically chosen stations and suggesting adaptive network configurations for future WSN deployments.

Abstract

Urban weather station networks (WSNs) are widely used to monitor urban weather and climate patterns and aid urban planning. However, maintaining WSNs is expensive and labor-intensive. Here, we present a step-wise station removal procedure to thin an existing WSN in Freiburg, Germany, and analyze the ability of WSN subsets to reproduce air temperature and humidity patterns of the entire original WSN for a year following a simulated reduction of WSN density. We found that substantial reductions in station numbers after one year of full deployment are possible while retaining high predictive accuracy. A reduction from 42 to 4 stations, for instance, increased mean prediction RMSEs from 0.69 K to 0.83 K for air temperature and from 3.8% to 4.4% for relative humidity, corresponding to RMSE increases of only 20% and 16%, respectively. Predictive accuracy is worse for remote stations in forests than for stations in built-up or open settings, but consistently better than a state-of-the-art numerical urban land-surface model (Surface Urban Energy and Water Balance Scheme). Stations located at the edges between built-up and rural areas are most valuable when reconstructing city-wide climate characteristics. Our study demonstrates the potential of thinning WSNs to maximize the efficient allocation of financial and personnel-related resources in urban climate research.

How many stations are sufficient? Exploring the effect of urban weather station density reduction on imputation accuracy of air temperature and humidity

TL;DR

The paper addresses how dense urban weather station networks can be thinned without severely compromising imputation of and by applying a data-driven stepwise station removal using Freiburg's 42-station WSN. It trains Extreme Gradient Boosting models in a backward-elimination framework, validated with 10-fold CV, and compares against GLM and SUEWS baselines. Results show substantial thinning is possible (e.g., 42→7–4 stations) with modest RMSE increases (approximately 0.13 K in and 0.29% in at mid-density reductions), though remote forest stations remain challenging and edge locations carry most information for city-wide patterns. The study provides practical guidance for cost-effective urban climate monitoring, highlighting the value of retaining 10–25% of strategically chosen stations and suggesting adaptive network configurations for future WSN deployments.

Abstract

Urban weather station networks (WSNs) are widely used to monitor urban weather and climate patterns and aid urban planning. However, maintaining WSNs is expensive and labor-intensive. Here, we present a step-wise station removal procedure to thin an existing WSN in Freiburg, Germany, and analyze the ability of WSN subsets to reproduce air temperature and humidity patterns of the entire original WSN for a year following a simulated reduction of WSN density. We found that substantial reductions in station numbers after one year of full deployment are possible while retaining high predictive accuracy. A reduction from 42 to 4 stations, for instance, increased mean prediction RMSEs from 0.69 K to 0.83 K for air temperature and from 3.8% to 4.4% for relative humidity, corresponding to RMSE increases of only 20% and 16%, respectively. Predictive accuracy is worse for remote stations in forests than for stations in built-up or open settings, but consistently better than a state-of-the-art numerical urban land-surface model (Surface Urban Energy and Water Balance Scheme). Stations located at the edges between built-up and rural areas are most valuable when reconstructing city-wide climate characteristics. Our study demonstrates the potential of thinning WSNs to maximize the efficient allocation of financial and personnel-related resources in urban climate research.

Paper Structure

This paper contains 23 sections, 1 equation, 16 figures, 5 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overview of the study area and the weather station locations. Figure taken from plein2025gapfilling.
  • Figure 2: Station-wise $T_a$ prediction RMSEs of EGB 1$\rightarrow$2 models across the number of stations retained. Values on top of the plot denote the average RMSE across all target stations and, in parenthesis, the percentage increase averaged across all stations compared to the full models, percentages on the left refer to the increase in RMSE relative to the EGB 1$\rightarrow$1 models, percentages on the right give the station-wise percentage RMSE increase of the models featuring two stations relative to the full models.
  • Figure 3: Station-wise $RH$ prediction RMSEs of EGB 1$\rightarrow$2 models across the number of stations retained. See Fig. \ref{['fig:rmse_no_stations_ta']} for further explanations.
  • Figure 4: $T_a$ prediction errors of EGB 1$\rightarrow$2 models across all stations and the entire second study year, split by daytime (top) vs. nighttime (bottom) and for the entire study period (left) vs. under hot conditions (right). Black horizontal lines denote the 1st, 5th, 50th (median), 95th, and 99th error percentiles.
  • Figure 5: Station-wise prediction bias of $T_a$ across CV folds for the first (EGB 1$\rightarrow$1) and second (EGB 1$\rightarrow$2) study year.
  • ...and 11 more figures