Table of Contents
Fetching ...

Augmenting Ground-Level PM2.5 Prediction via Kriging-Based Pseudo-Label Generation

Lei Duan, Ziyang Jiang, David Carlson

TL;DR

It is shown that the proposed data augmentation strategy helps enhance the performance of the state-of-the-art convolutional neural network-random forest (CNN-RF) model by a reasonable amount, resulting in a noteworthy improvement in spatial correlation and a reduction in prediction error.

Abstract

Fusing abundant satellite data with sparse ground measurements constitutes a major challenge in climate modeling. To address this, we propose a strategy to augment the training dataset by introducing unlabeled satellite images paired with pseudo-labels generated through a spatial interpolation technique known as ordinary kriging, thereby making full use of the available satellite data resources. We show that the proposed data augmentation strategy helps enhance the performance of the state-of-the-art convolutional neural network-random forest (CNN-RF) model by a reasonable amount, resulting in a noteworthy improvement in spatial correlation and a reduction in prediction error.

Augmenting Ground-Level PM2.5 Prediction via Kriging-Based Pseudo-Label Generation

TL;DR

It is shown that the proposed data augmentation strategy helps enhance the performance of the state-of-the-art convolutional neural network-random forest (CNN-RF) model by a reasonable amount, resulting in a noteworthy improvement in spatial correlation and a reduction in prediction error.

Abstract

Fusing abundant satellite data with sparse ground measurements constitutes a major challenge in climate modeling. To address this, we propose a strategy to augment the training dataset by introducing unlabeled satellite images paired with pseudo-labels generated through a spatial interpolation technique known as ordinary kriging, thereby making full use of the available satellite data resources. We show that the proposed data augmentation strategy helps enhance the performance of the state-of-the-art convolutional neural network-random forest (CNN-RF) model by a reasonable amount, resulting in a noteworthy improvement in spatial correlation and a reduction in prediction error.
Paper Structure (16 sections, 6 equations, 4 figures)

This paper contains 16 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: Illustration of pseudo-label generation process. The ground measurements consist of the ground-level PM$_{2.5}$ along with 3 meteorological attributes: sea level pressure (SLP), temperature (T), and relative humidity (RH). We first create a spatial mapping of interpolated measurements for the entire area of study using ordinary kriging. Then we pair unlabeled satellite images with corresponding interpolated measurements based on their geographical coordinates. Namely, each final data point with pseudo labels consists of 5 components: satellite image, interpolated SLP, T, RH, and PM$_{2.5}$ measurements.
  • Figure 2: Filtering rules for unlabeled satellite images. To be included in the training dataset, an unlabeled image must not be located within the AOI of any AQM station and must have at least $N_{\text{sensor}}$ AQM stations within its vicinity (defined by a circular region with a radius of $r$). In our experiments, we set $N_{\text{sensor}} = 4$ and $r = 0.2$.
  • Figure 3: Performance of CNN-RF model on the test data with different number of pseudo-labeled images. (a) Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), (b) Pearson R and Spatial Pearson R.
  • Figure A1: Empirical semivariograms (plotted as blue dots) with different theoretical fits (plotted as green lines), i.e., spherical, exponential, and Gaussian.