Table of Contents
Fetching ...

Air Pollution Forecasting in Bucharest

Dragoş-Andrei Şerban, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel

TL;DR

The paper tackles PM2.5 forecasting in Bucharest by evaluating a broad spectrum of models—from traditional linear and ensemble methods to deep learning, transformers, and LLM-based approaches—across multiple horizons ($1$,$2$,$4$ hours, with some $8$-hour cases). It introduces a Bucharest-specific dataset with extensive pollutant and meteorological features, after applying comprehensive preprocessing including outlier handling via $FBEWMA$ and lag-feature engineering. The study finds that transformer-based models generally provide the best predictive performance, with advanced RNNs and hybrid architectures also performing well, while LLMs with RAG offer limited improvements. Limitations include reliance on a single measurement station and the absence of exogenous data like traffic, suggesting future work with multi-station data and richer external features to further improve forecasts and capture seasonality and spatial variability.

Abstract

Air pollution, especially the particulate matter 2.5 (PM2.5), has become a growing concern in recent years, primarily in urban areas. Being exposed to air pollution is linked to developing numerous health problems, like the aggravation of respiratory diseases, cardiovascular disorders, lung function impairment, and even cancer or early death. Forecasting future levels of PM2.5 has become increasingly important over the past few years, as it can provide early warnings and help prevent diseases. This paper aims to design, fine-tune, test, and evaluate machine learning models for predicting future levels of PM2.5 over various time horizons. Our primary objective is to assess and compare the performance of multiple models, ranging from linear regression algorithms and ensemble-based methods to deep learning models, such as advanced recurrent neural networks and transformers, as well as large language models, on this forecasting task.

Air Pollution Forecasting in Bucharest

TL;DR

The paper tackles PM2.5 forecasting in Bucharest by evaluating a broad spectrum of models—from traditional linear and ensemble methods to deep learning, transformers, and LLM-based approaches—across multiple horizons (,, hours, with some -hour cases). It introduces a Bucharest-specific dataset with extensive pollutant and meteorological features, after applying comprehensive preprocessing including outlier handling via and lag-feature engineering. The study finds that transformer-based models generally provide the best predictive performance, with advanced RNNs and hybrid architectures also performing well, while LLMs with RAG offer limited improvements. Limitations include reliance on a single measurement station and the absence of exogenous data like traffic, suggesting future work with multi-station data and richer external features to further improve forecasts and capture seasonality and spatial variability.

Abstract

Air pollution, especially the particulate matter 2.5 (PM2.5), has become a growing concern in recent years, primarily in urban areas. Being exposed to air pollution is linked to developing numerous health problems, like the aggravation of respiratory diseases, cardiovascular disorders, lung function impairment, and even cancer or early death. Forecasting future levels of PM2.5 has become increasingly important over the past few years, as it can provide early warnings and help prevent diseases. This paper aims to design, fine-tune, test, and evaluate machine learning models for predicting future levels of PM2.5 over various time horizons. Our primary objective is to assess and compare the performance of multiple models, ranging from linear regression algorithms and ensemble-based methods to deep learning models, such as advanced recurrent neural networks and transformers, as well as large language models, on this forecasting task.

Paper Structure

This paper contains 17 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Evolution of PM2.5, air temperature, and wind speed between 2019 and 2023.
  • Figure 2: Distribution of sensor values for all nine attributes in our dataset.
  • Figure 3: Correlation matrix between measured attributes.