Table of Contents
Fetching ...

Project RISE: Recognizing Industrial Smoke Emissions

Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang, Ting-Yao Hu, Paul Dille, Sean Prendi, Ryan Hoffman, Anastasia Tsuhlares, Jessica Pachuta, Randy Sargent, Illah Nourbakhsh

TL;DR

RISE addresses the scarcity of large-scale, real-world data for industrial smoke recognition by introducing a large, daylight-only video dataset collected from three coke plants using a citizen science workflow. The dataset comprises 12,567 labeled clips (36 frames each, 180x180) across 19 views and two years of data, enabling robust cross-view evaluation and temporal analysis. The authors benchmark an RGB-I3D baseline, compare citizen scientists with MTurk labeling, and conduct a survey and qualitative analyses to highlight social-impact insights and design challenges. The work demonstrates how community-driven AI can empower environmental justice and inform regulators, while candidly discussing wicked-data problems, data quality, and generalization limits, and offering a practical path for integrating citizen science with AI for social impact.

Abstract

Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens to pursue environmental justice. However, existing datasets are not of sufficient quality nor quantity to train the robust CV models needed to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopted a citizen science approach to collaborate with local community members to annotate whether a video clip has smoke emissions. Our dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons. We ran experiments using deep neural networks to establish a strong performance baseline and reveal smoke recognition challenges. Our survey study discussed community feedback, and our data analysis displayed opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for Social Impact.

Project RISE: Recognizing Industrial Smoke Emissions

TL;DR

RISE addresses the scarcity of large-scale, real-world data for industrial smoke recognition by introducing a large, daylight-only video dataset collected from three coke plants using a citizen science workflow. The dataset comprises 12,567 labeled clips (36 frames each, 180x180) across 19 views and two years of data, enabling robust cross-view evaluation and temporal analysis. The authors benchmark an RGB-I3D baseline, compare citizen scientists with MTurk labeling, and conduct a survey and qualitative analyses to highlight social-impact insights and design challenges. The work demonstrates how community-driven AI can empower environmental justice and inform regulators, while candidly discussing wicked-data problems, data quality, and generalization limits, and offering a practical path for integrating citizen science with AI for social impact.

Abstract

Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens to pursue environmental justice. However, existing datasets are not of sufficient quality nor quantity to train the robust CV models needed to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopted a citizen science approach to collaborate with local community members to annotate whether a video clip has smoke emissions. Our dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons. We ran experiments using deep neural networks to establish a strong performance baseline and reveal smoke recognition challenges. Our survey study discussed community feedback, and our data analysis displayed opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for Social Impact.

Paper Structure

This paper contains 12 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Dataset samples and the deployed camera system.
  • Figure 2: All views of videos in the RISE dataset. The rightmost four views are from different sites pointing at another facility.
  • Figure 3: The individual mode of the smoke labeling system. Users can scroll the page and click or tap on the video clips to indicate that the video has smoke with a red border-box.
  • Figure 5: True positives in the test set from split $S_0$. The top and bottom rows show the original video frame and the overlaying heatmap of Class Activation Mapping, respectively.