Table of Contents
Fetching ...

Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities

Prasenjit Karmakar, Swadhin Pradhan, Sandip Chakraborty

TL;DR

This paper tackles the paucity of indoor air quality data in developing countries by presenting a large-scale, activity-contextualized IAQ dataset collected in India. The authors deploy a low-cost, multi-sensor platform (DALTON) across 30 sites in four regions over six months, coupled with real-time activity annotations via a speech-to-text app, and provide floor plans to study pollutant spread. The dataset comprises around 89.1 million pollutant samples and 3957 activity annotations, enabling analyses of source emission, ventilation effects, and floor-plan influences, as well as ML tasks like activity recognition and cooking-item classification with strong performance in controlled scenarios. Openly available under AGPL-3.0, the dataset supports data-driven indoor design, smart ventilation policies, and pollution-aware applications in LMIC contexts, with ongoing updates and community contributions.

Abstract

In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India. The dataset contains various types of indoor environments (e.g., studio apartments, classrooms, research laboratories, food canteens, and residential households), and can provide the basis for data-driven learning model research aimed at coping with unique pollution patterns in developing countries. This unique dataset demands advanced data cleaning and imputation techniques for handling missing data due to power failure or network outages during data collection. Furthermore, through a simple speech-to-text application, we provide real-time indoor activity labels annotated by occupants. Therefore, environmentalists and ML enthusiasts can utilize this dataset to understand the complex patterns of the pollutants under different indoor activities, identify recurring sources of pollution, forecast exposure, improve floor plans and room structures of modern indoor designs, develop pollution-aware recommender systems, etc.

Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities

TL;DR

This paper tackles the paucity of indoor air quality data in developing countries by presenting a large-scale, activity-contextualized IAQ dataset collected in India. The authors deploy a low-cost, multi-sensor platform (DALTON) across 30 sites in four regions over six months, coupled with real-time activity annotations via a speech-to-text app, and provide floor plans to study pollutant spread. The dataset comprises around 89.1 million pollutant samples and 3957 activity annotations, enabling analyses of source emission, ventilation effects, and floor-plan influences, as well as ML tasks like activity recognition and cooking-item classification with strong performance in controlled scenarios. Openly available under AGPL-3.0, the dataset supports data-driven indoor design, smart ventilation policies, and pollution-aware applications in LMIC contexts, with ongoing updates and community contributions.

Abstract

In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India. The dataset contains various types of indoor environments (e.g., studio apartments, classrooms, research laboratories, food canteens, and residential households), and can provide the basis for data-driven learning model research aimed at coping with unique pollution patterns in developing countries. This unique dataset demands advanced data cleaning and imputation techniques for handling missing data due to power failure or network outages during data collection. Furthermore, through a simple speech-to-text application, we provide real-time indoor activity labels annotated by occupants. Therefore, environmentalists and ML enthusiasts can utilize this dataset to understand the complex patterns of the pollutants under different indoor activities, identify recurring sources of pollution, forecast exposure, improve floor plans and room structures of modern indoor designs, develop pollution-aware recommender systems, etc.
Paper Structure (28 sections, 15 figures, 6 tables)

This paper contains 28 sections, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Overview of our extensive field study and data collection with multiple air quality monitors in a typical indoor environment. The scenario shows four DALTON sensors deployed in a household that are utilizing the house's WiFi network to send pollutant readings to the cloud. Moreover, the occupants actively participate in the study by providing activity and event context (i.e., cooking, eating, etc.) via the easy-to-use speech-to-text vocalAnnot Android application.
  • Figure 2: Deployment images from various indoor environments -- (a) Kitchen, (b) Research Lab, (c) Studio Apartment, (d) Food Canteen. The sensor is highlighted in green outline. We have strategically installed at least one sensor in each room (e.g., a typical household can have six rooms). The devices are positioned at a height of 1 meter to 1.5 meters from the ground (i.e., around chest height) based on the availability of standard power outlets to accurately quantify the exposure level for the occupants.
  • Figure 3: Air quality variation with activities of daily living in the morning time. The figure shows CO2, PM2.5, and VOC concentration in the kitchen, adjacent bedroom, and dining while preparing meals for lunch. Long-term frying (e.g., fish) significantly elevates PM2.5 and VOC levels that transcend to nearby rooms. Meanwhile, pollutants from boiling, heating, or short-term frying remain contained near the source and do not lead to severe spread. Cleaning and mopping activities increase the relative humidity of the indoors.
  • Figure 4: Split air conditioners suffer from compromised ventilation compared to legacy window air conditioners simply to improve power efficiency. The figure clearly depicts the accumulation of VOC and CO2 over time when split AC is on. Meanwhile, we observe consistent ventilation when the window AC is on, achieving healthy air quality with time. The relative humidity is slightly higher for window AC (i.e., within the comfort range, 30--50%).
  • Figure 5: (a) depicts pollutant accumulation if the kitchen exhaust fan is turned off (i.e., 4.7$\times$ PM2.5). (b) shows the maximum pollutant increase before and after 30 minutes of using mosquito repellent or burning incense sticks (i.e., 4.8$\times$ VOC for burning sticks). Further, we have divided the day into Early Morning (00:00--06:00), Morning (06:00--12:00), Afternoon (12:00-18:00), and Evening (18:00-23:00) hours. (c) shows in the morning, the kitchen has higher CO2, while research labs are impacted in the afternoon and evening due to occupancy. Lastly, (d) shows that PM2.5 is predominant in the kitchen and rooms at any time of the day. The dashed line represents healthy pollutant level.
  • ...and 10 more figures