Table of Contents
Fetching ...

Bridging Synthetic and Real-World Domains: A Human-in-the-Loop Weakly-Supervised Framework for Industrial Toxic Emission Segmentation

Yida Tao, Yen-Chia Hsu

TL;DR

This paper tackles industrial toxic-emission segmentation under scarce pixel-level annotations by introducing CEDANet, a human-in-the-loop, weakly-supervised domain-adaptation framework. It combines citizen-provided video-level labels with a Transmission-guided Bayesian backbone and a local contrastive loss to learn robust, domain-invariant features, enforced by two class-aware domain discriminators via a Gradient Reversal Layer. The approach refines pseudo-labels through frame-level selection guided by citizen input and leverages class-specific alignment to bridge synthetic/real source data with real industrial targets, achieving substantial gains in $F_{1}$ and $IoU_{ ext{smoke}}$ and approaching fully supervised performance with far less target-domain annotation. This demonstrates the scalability and cost-efficiency of citizen-science-informed weak supervision for environmental monitoring in data-scarce, high-variability industrial contexts.

Abstract

Industrial smoke segmentation is critical for air-quality monitoring and environmental protection but is often hampered by the high cost and scarcity of pixel-level annotations in real-world settings. We introduce CEDANet, a human-in-the-loop, class-aware domain adaptation framework that uniquely integrates weak, citizen-provided video-level labels with adversarial feature alignment. Specifically, we refine pseudo-labels generated by a source-trained segmentation model using citizen votes, and employ class-specific domain discriminators to transfer rich source-domain representations to the industrial domain. Comprehensive experiments on SMOKE5K and custom IJmond datasets demonstrate that CEDANet achieves an F1-score of 0.414 and a smoke-class IoU of 0.261 with citizen feedback, vastly outperforming the baseline model, which scored 0.083 and 0.043 respectively. This represents a five-fold increase in F1-score and a six-fold increase in smoke-class IoU. Notably, CEDANet with citizen-constrained pseudo-labels achieves performance comparable to the same architecture trained on limited 100 fully annotated images with F1-score of 0.418 and IoU of 0.264, demonstrating its ability to reach small-sampled fully supervised-level accuracy without target-domain annotations. Our research validates the scalability and cost-efficiency of combining citizen science with weakly supervised domain adaptation, offering a practical solution for complex, data-scarce environmental monitoring applications.

Bridging Synthetic and Real-World Domains: A Human-in-the-Loop Weakly-Supervised Framework for Industrial Toxic Emission Segmentation

TL;DR

This paper tackles industrial toxic-emission segmentation under scarce pixel-level annotations by introducing CEDANet, a human-in-the-loop, weakly-supervised domain-adaptation framework. It combines citizen-provided video-level labels with a Transmission-guided Bayesian backbone and a local contrastive loss to learn robust, domain-invariant features, enforced by two class-aware domain discriminators via a Gradient Reversal Layer. The approach refines pseudo-labels through frame-level selection guided by citizen input and leverages class-specific alignment to bridge synthetic/real source data with real industrial targets, achieving substantial gains in and and approaching fully supervised performance with far less target-domain annotation. This demonstrates the scalability and cost-efficiency of citizen-science-informed weak supervision for environmental monitoring in data-scarce, high-variability industrial contexts.

Abstract

Industrial smoke segmentation is critical for air-quality monitoring and environmental protection but is often hampered by the high cost and scarcity of pixel-level annotations in real-world settings. We introduce CEDANet, a human-in-the-loop, class-aware domain adaptation framework that uniquely integrates weak, citizen-provided video-level labels with adversarial feature alignment. Specifically, we refine pseudo-labels generated by a source-trained segmentation model using citizen votes, and employ class-specific domain discriminators to transfer rich source-domain representations to the industrial domain. Comprehensive experiments on SMOKE5K and custom IJmond datasets demonstrate that CEDANet achieves an F1-score of 0.414 and a smoke-class IoU of 0.261 with citizen feedback, vastly outperforming the baseline model, which scored 0.083 and 0.043 respectively. This represents a five-fold increase in F1-score and a six-fold increase in smoke-class IoU. Notably, CEDANet with citizen-constrained pseudo-labels achieves performance comparable to the same architecture trained on limited 100 fully annotated images with F1-score of 0.418 and IoU of 0.264, demonstrating its ability to reach small-sampled fully supervised-level accuracy without target-domain annotations. Our research validates the scalability and cost-efficiency of combining citizen science with weakly supervised domain adaptation, offering a practical solution for complex, data-scarce environmental monitoring applications.

Paper Structure

This paper contains 59 sections, 19 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Emissions from factories in IJmond region, the Netherlands: the first two on the left show hazardous emissions, while the last two images on the right depict steam.
  • Figure 2: The architecture of the Transmission-guided Bayesian generative network, which serves as the backbone of our model yan2023transmissionguidedbayesiangenerativemodel.
  • Figure 3: Overview of the frame selection and pseudo-label generation pipeline. The process begins by extracting frames $f$ from a video $v$. A pretrained model $\mathcal{M}_{pretrain}$ generates an initial probability map $P(f)$, from which a confidence score $C(f)$ is derived. Top-$k$ candidate frames $f^{*}$ are selected based on confidence and expanded into a temporal window. After a final selection step, probability binarization is applied to the frames in the optimal time window, producing pseudo-labels $\hat{Y}(\mathcal{D}_{target})$ for the selected frames.
  • Figure 4: Overview of our weakly supervised domain adaptation framework. The model consists of a feature generator, a gradient reversal layer, and two category-aware domain discriminators. The feature generator is based on the Transmission-guided Bayesian (TGB) network yan2023transmissionguidedbayesiangenerativemodel.
  • Figure 5: Sample from the target domain IJmond900 dataset. (a) Original image with pixel-level ground-truth masks: high-opacity smoke regions in red and low-opacity regions in blue. (b) Multi-patch cropping strategy: for each annotated smoke area we extract two patches centered with random offsets (red boxes), and one additional patch randomly sampled across the image (green box).
  • ...and 2 more figures