Table of Contents
Fetching ...

Lightweight ML-Based Air Quality Prediction for IoT and Embedded Applications

Md. Sad Abdullah Sami, Mushfiquzzaman Abid

TL;DR

The paper investigates lightweight versus full XGBoost configurations for predicting CO and NO2 using the AirQualityUCI dataset, balancing predictive accuracy with edge-deployment constraints. It introduces a resource-aware framework, evaluates standard regression metrics alongside model size, inference time, and RAM usage, and finds that the full model is more accurate while the tiny model offers substantial efficiency gains suitable for embedded IoT contexts. The key contribution is quantifying the trade-offs between accuracy and resource demands to guide TinyML-informed deployment in urban air quality monitoring. The results support deploying simplified models on constrained devices without severely sacrificing forecast quality, enabling real-time, on-device air quality sensing. The study also outlines limitations and directions for validating across diverse contexts and in real-time streaming settings.

Abstract

This study investigates the effectiveness and efficiency of two variants of the XGBoost regression model, the full-capacity and lightweight (tiny) versions, for predicting the concentrations of carbon monoxide (CO) and nitrogen dioxide (NO2). Using the AirQualityUCI dataset collected over one year in an urban environment, we conducted a comprehensive evaluation based on widely accepted metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Bias Error (MBE), and the coefficient of determination (R2). In addition, we assessed resource-oriented metrics such as inference time, model size, and peak RAM usage. The full XGBoost model achieved superior predictive accuracy for both pollutants, while the tiny model, though slightly less precise, offered substantial computational benefits with significantly reduced inference time and model storage requirements. These results demonstrate the feasibility of deploying simplified models in resource-constrained environments without compromising predictive quality. This makes the tiny XGBoost model suitable for real-time air-quality monitoring in IoT and embedded applications.

Lightweight ML-Based Air Quality Prediction for IoT and Embedded Applications

TL;DR

The paper investigates lightweight versus full XGBoost configurations for predicting CO and NO2 using the AirQualityUCI dataset, balancing predictive accuracy with edge-deployment constraints. It introduces a resource-aware framework, evaluates standard regression metrics alongside model size, inference time, and RAM usage, and finds that the full model is more accurate while the tiny model offers substantial efficiency gains suitable for embedded IoT contexts. The key contribution is quantifying the trade-offs between accuracy and resource demands to guide TinyML-informed deployment in urban air quality monitoring. The results support deploying simplified models on constrained devices without severely sacrificing forecast quality, enabling real-time, on-device air quality sensing. The study also outlines limitations and directions for validating across diverse contexts and in real-time streaming settings.

Abstract

This study investigates the effectiveness and efficiency of two variants of the XGBoost regression model, the full-capacity and lightweight (tiny) versions, for predicting the concentrations of carbon monoxide (CO) and nitrogen dioxide (NO2). Using the AirQualityUCI dataset collected over one year in an urban environment, we conducted a comprehensive evaluation based on widely accepted metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Bias Error (MBE), and the coefficient of determination (R2). In addition, we assessed resource-oriented metrics such as inference time, model size, and peak RAM usage. The full XGBoost model achieved superior predictive accuracy for both pollutants, while the tiny model, though slightly less precise, offered substantial computational benefits with significantly reduced inference time and model storage requirements. These results demonstrate the feasibility of deploying simplified models in resource-constrained environments without compromising predictive quality. This makes the tiny XGBoost model suitable for real-time air-quality monitoring in IoT and embedded applications.

Paper Structure

This paper contains 12 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The proposed methodology for pollutant prediction using XGBoost, including preprocessing, training, evaluation and resource-focused profiling.
  • Figure 2: Performance comparison of full and tiny XGBoost models on CO and NO2 concentrations. The subplots show evaluation metrics: (a) MAE, (b) RMSE, (c) MBE, and (d) R2.
  • Figure 3: Computational resource usage comparison between full and tiny XGBoost models for CO and NO2 prediction. Metrics include (a) inference time, (b) model size, and (c) peak RAM usage.
  • Figure 4: Inference Time vs R2 score trade-off curve.
  • Figure 5: Model Size vs R2 score trade-off curve.