Table of Contents
Fetching ...

Provably Outlier-resistant Semi-parametric Regression for Transferable Calibration of Low-cost Air-quality Sensors

Divyansh Chaurasia, Manoj Daram, Roshan Kumar, Nihal Thukarama Rao, Vipul Sangode, Pranjal Srivastava, Avnish Tripathi, Shoubhik Chakraborty, Akanksha, Ambasht Kumar, Davender Sethi, Sachchida Nand Tripathi, Purushottam Kar

TL;DR

This work tackles calibration of low-cost CO sensors across diverse deployment conditions by introducing RESPIRE, an outlier-resistant semi-parametric regression framework. RESPIRE models temperature-dependent sensor sensitivities non-parametrically within an RKHS, enabling robust transfer across sites, seasons, and sensors, while providing an interpretable set of weights and a bias term that reveal potential calibration issues. A robust training procedure (SPR) based on an APIS-inspired alternating scheme detects and downweights outliers, with a hard-thresholding step yielding compressed, sparse models without sacrificing performance. Empirical results on a large, multi-site mobile deployment show RESPIRE achieving strong transfer performance, effective sensor-to-sensor adaptation, and the ability to detect anomalies such as swapped sensor readings, underscoring its practical utility for scalable LCAQ networks. The work positions RESPIRE as a practical, provably robust calibration tool with public code for broad adoption in environmental sensing and monitoring networks.

Abstract

We present a case study for the calibration of Low-cost air-quality (LCAQ) CO sensors from one of the largest multi-site-multi-season-multi-sensor-multi-pollutant mobile air-quality monitoring network deployments in India. LCAQ sensors have been shown to play a critical role in the establishment of dense, expansive air-quality monitoring networks and combating elevated pollution levels. The calibration of LCAQ sensors against regulatory-grade monitors is an expensive, laborious and time-consuming process, especially when a large number of sensors are to be deployed in a geographically diverse layout. In this work, we present the RESPIRE technique to calibrate LCAQ sensors to detect ambient CO (Carbon Monoxide) levels. RESPIRE offers specific advantages over baseline calibration methods popular in literature, such as improved prediction in cross-site, cross-season, and cross-sensor settings. RESPIRE offers a training algorithm that is provably resistant to outliers and an explainable model with the ability to flag instances of model overfitting. Empirical results are presented based on data collected during an extensive deployment spanning four sites, two seasons and six sensor packages. RESPIRE code is available at https://github.com/purushottamkar/respire.

Provably Outlier-resistant Semi-parametric Regression for Transferable Calibration of Low-cost Air-quality Sensors

TL;DR

This work tackles calibration of low-cost CO sensors across diverse deployment conditions by introducing RESPIRE, an outlier-resistant semi-parametric regression framework. RESPIRE models temperature-dependent sensor sensitivities non-parametrically within an RKHS, enabling robust transfer across sites, seasons, and sensors, while providing an interpretable set of weights and a bias term that reveal potential calibration issues. A robust training procedure (SPR) based on an APIS-inspired alternating scheme detects and downweights outliers, with a hard-thresholding step yielding compressed, sparse models without sacrificing performance. Empirical results on a large, multi-site mobile deployment show RESPIRE achieving strong transfer performance, effective sensor-to-sensor adaptation, and the ability to detect anomalies such as swapped sensor readings, underscoring its practical utility for scalable LCAQ networks. The work positions RESPIRE as a practical, provably robust calibration tool with public code for broad adoption in environmental sensing and monitoring networks.

Abstract

We present a case study for the calibration of Low-cost air-quality (LCAQ) CO sensors from one of the largest multi-site-multi-season-multi-sensor-multi-pollutant mobile air-quality monitoring network deployments in India. LCAQ sensors have been shown to play a critical role in the establishment of dense, expansive air-quality monitoring networks and combating elevated pollution levels. The calibration of LCAQ sensors against regulatory-grade monitors is an expensive, laborious and time-consuming process, especially when a large number of sensors are to be deployed in a geographically diverse layout. In this work, we present the RESPIRE technique to calibrate LCAQ sensors to detect ambient CO (Carbon Monoxide) levels. RESPIRE offers specific advantages over baseline calibration methods popular in literature, such as improved prediction in cross-site, cross-season, and cross-sensor settings. RESPIRE offers a training algorithm that is provably resistant to outliers and an explainable model with the ability to flag instances of model overfitting. Empirical results are presented based on data collected during an extensive deployment spanning four sites, two seasons and six sensor packages. RESPIRE code is available at https://github.com/purushottamkar/respire.

Paper Structure

This paper contains 7 sections, 3 theorems, 15 equations, 13 figures, 2 tables.

Key Result

Theorem 1

Given independent variables $X_1, X_2$ and Gram matrix $G$ over the auxiliary variables, suppose the (uncorrupted) targets are generated as ${{\mathbf{y}}}^\ast = (X_1GX_1 + X_2GX_2 + G)\pmb{\beta}^\ast$ where $\pmb{\beta}^\ast \in {\mathbb R}^N$ is spanned by the top $s$ eigenvectors of $X_1GX_1 +

Figures (13)

  • Figure 1: RESPIRE pseudocode describing the base training and inference procedures and robust learning
  • Figure 2: Leader boards for train, non-transfer (SS) and transfer (SX, XS, XX) experiments. The height of the bars indicate how many times a method delivered the (tied) best performance across all 6 sensors, on all sites during training, SS, SX, XS and XX experiments. The baseline methods DT, GBDT, KNN dominate train performance but perform poorly even in the SS transfer case where MLP, KRR and linear (RR) dominate. However, when considering non-trivial transfers such as SX, XS and XX, RESPIRE is the clear winner whether adaptation is offered or not. MLP and linear (RR) are the next best methods.
  • Figure 3: Two transfer scenarios demonstrating possible zero-shifts in RGI data being a contributor to poor $R^2$ performance. A simple 1D adapter greatly improves performance in all cases for all methods.
  • Figure 4: All methods offer excellent performance during training.
  • Figure 5: DT, GBDT, KNN show steep drops even with non-transfer SS testing.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof