Table of Contents
Fetching ...

Fine-gained air quality inference based on low-quality sensing data using self-supervised learning

Meng Xu, Ke Han, Weijian Hu, Wen Ji

TL;DR

This work tackles fine-grained air quality inference under limited labeled data by fusing abundant, noisy micro-station readings with sparse, high-fidelity standardized readings. It introduces the Multi-Task Spatio-Temporal Network (MTSTN), a self-supervised framework that learns from unlabeled data through a spatial interpolation-based pretext task while performing core supervised AQ inference, with STL-derived trend features from micro-stations providing stable, informative cues. The approach achieves state-of-the-art accuracy across NO$_2$, O$_3$, and PM$_{2.5}$ in Chengdu, and ablation analyses demonstrate the pivotal role of the self-supervised task, the STL trend feature, gradient-based feature selection, and the adjacency structures. The results indicate MTSTN’s practical value for accurate, affordable AQ inference using pervasive, low-quality sensing data, with robust performance under missing data scenarios and clear interpretability through feature importance rankings.

Abstract

Fine-grained air quality (AQ) mapping is made possible by the proliferation of cheap AQ micro-stations (MSs). However, their measurements are often inaccurate and sensitive to local disturbances, in contrast to standardized stations (SSs) that provide accurate readings but fall short in number. To simultaneously address the issues of low data quality (MSs) and high label sparsity (SSs), a multi-task spatio-temporal network (MTSTN) is proposed, which employs self-supervised learning to utilize massive unlabeled data, aided by seasonal and trend decomposition of MS data offering reliable information as features. The MTSTN is applied to infer NO$_2$, O$_3$ and PM$_{2.5}$ concentrations in a 250 km$^2$ area in Chengdu, China, at a resolution of 500m$\times$500m$\times$1hr. Data from 55 SSs and 323 MSs were used, along with meteorological, traffic, geographic and timestamp data as features. The MTSTN excels in accuracy compared to several benchmarks, and its performance is greatly enhanced by utilizing low-quality MS data. A series of ablation and pressure tests demonstrate the results' robustness and interpretability, showcasing the MTSTN's practical value for accurate and affordable AQ inference.

Fine-gained air quality inference based on low-quality sensing data using self-supervised learning

TL;DR

This work tackles fine-grained air quality inference under limited labeled data by fusing abundant, noisy micro-station readings with sparse, high-fidelity standardized readings. It introduces the Multi-Task Spatio-Temporal Network (MTSTN), a self-supervised framework that learns from unlabeled data through a spatial interpolation-based pretext task while performing core supervised AQ inference, with STL-derived trend features from micro-stations providing stable, informative cues. The approach achieves state-of-the-art accuracy across NO, O, and PM in Chengdu, and ablation analyses demonstrate the pivotal role of the self-supervised task, the STL trend feature, gradient-based feature selection, and the adjacency structures. The results indicate MTSTN’s practical value for accurate, affordable AQ inference using pervasive, low-quality sensing data, with robust performance under missing data scenarios and clear interpretability through feature importance rankings.

Abstract

Fine-grained air quality (AQ) mapping is made possible by the proliferation of cheap AQ micro-stations (MSs). However, their measurements are often inaccurate and sensitive to local disturbances, in contrast to standardized stations (SSs) that provide accurate readings but fall short in number. To simultaneously address the issues of low data quality (MSs) and high label sparsity (SSs), a multi-task spatio-temporal network (MTSTN) is proposed, which employs self-supervised learning to utilize massive unlabeled data, aided by seasonal and trend decomposition of MS data offering reliable information as features. The MTSTN is applied to infer NO, O and PM concentrations in a 250 km area in Chengdu, China, at a resolution of 500m500m1hr. Data from 55 SSs and 323 MSs were used, along with meteorological, traffic, geographic and timestamp data as features. The MTSTN excels in accuracy compared to several benchmarks, and its performance is greatly enhanced by utilizing low-quality MS data. A series of ablation and pressure tests demonstrate the results' robustness and interpretability, showcasing the MTSTN's practical value for accurate and affordable AQ inference.
Paper Structure (35 sections, 17 equations, 14 figures, 4 tables)

This paper contains 35 sections, 17 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Distribution of 378 air quality monitoring stations (55 standardized stations and 323 micro-stations) across the central region of Chengdu, China. The context (labeled) grids have standardized stations, while the target (unlabeled) grids do not.
  • Figure 2: NO$_2$ concentrations and their decomposition results during Mar. 3 - 10, 2022
  • Figure 3: Evaluation strategy and dataset segmentation.
  • Figure 4: Key feature ranking
  • Figure 5: Data missing ratio study.
  • ...and 9 more figures