Table of Contents
Fetching ...

Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point

Travis Barrett, Amit Kumar Mishra

TL;DR

This work tackles the reliability of inexpensive environmental CO2 sensors by jointly collecting a co located high fidelity reference dataset and exploring ML based calibration methods. It introduces a Cape Point data set featuring sensor drift and operating irregularities, and evaluates Random Forest Regression, Support Vector Regression, 1D-CNN, and 1D-CNN LSTM for automatic calibration while conducting statistical and ergodicity analyses. The results indicate that SVR broadly delivers robust calibration with favorable distribution similarity, though all models exhibit performance degradation over time, highlighting non stationary conditions. The study demonstrates the potential to extend manual calibration lifetimes and enable scalable deployment of low cost sensor networks for wide scale environmental monitoring, while also outlining limitations and directions for incorporating additional environmental factors and multi site data.

Abstract

In this paper we present the statistical analysis of data from inexpensive sensors. We also present the performance of machine learning algorithms when used for automatic calibration such sensors. In this we have used low-cost Non-Dispersive Infrared CO$_2$ sensor placed at a co-located site at Cape Point, South Africa (maintained by Weather South Africa). The collected low-cost sensor data and site truth data are investigated and compared. We compare and investigate the performance of Random Forest Regression, Support Vector Regression, 1D Convolutional Neural Network and 1D-CNN Long Short-Term Memory Network models as a method for automatic calibration and the statistical properties of these model predictions. In addition, we also investigate the drift in performance of these algorithms with time.

Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point

TL;DR

This work tackles the reliability of inexpensive environmental CO2 sensors by jointly collecting a co located high fidelity reference dataset and exploring ML based calibration methods. It introduces a Cape Point data set featuring sensor drift and operating irregularities, and evaluates Random Forest Regression, Support Vector Regression, 1D-CNN, and 1D-CNN LSTM for automatic calibration while conducting statistical and ergodicity analyses. The results indicate that SVR broadly delivers robust calibration with favorable distribution similarity, though all models exhibit performance degradation over time, highlighting non stationary conditions. The study demonstrates the potential to extend manual calibration lifetimes and enable scalable deployment of low cost sensor networks for wide scale environmental monitoring, while also outlining limitations and directions for incorporating additional environmental factors and multi site data.

Abstract

In this paper we present the statistical analysis of data from inexpensive sensors. We also present the performance of machine learning algorithms when used for automatic calibration such sensors. In this we have used low-cost Non-Dispersive Infrared CO sensor placed at a co-located site at Cape Point, South Africa (maintained by Weather South Africa). The collected low-cost sensor data and site truth data are investigated and compared. We compare and investigate the performance of Random Forest Regression, Support Vector Regression, 1D Convolutional Neural Network and 1D-CNN Long Short-Term Memory Network models as a method for automatic calibration and the statistical properties of these model predictions. In addition, we also investigate the drift in performance of these algorithms with time.

Paper Structure

This paper contains 29 sections, 9 equations, 15 figures, 14 tables.

Figures (15)

  • Figure 1: This image shows the sensor system connected to the backbone of its purpose built, double-louvered Stevenson screen, enclosure during preparation for deployment at Cape Point, South Africa.
  • Figure 2: Histogram of Data Set 1 split into two slices for comparison.
  • Figure 3: Histogram of Data Set 1 split into four slices for comparison.
  • Figure 4: Histogram of Data Set 1 truth data split into two slices for comparison.
  • Figure 5: Histogram of Data Set 1 truth data split into four slices for comparison.
  • ...and 10 more figures