Bridging Synthetic and Real-World Domains: A Human-in-the-Loop Weakly-Supervised Framework for Industrial Toxic Emission Segmentation
Yida Tao, Yen-Chia Hsu
TL;DR
This paper tackles industrial toxic-emission segmentation under scarce pixel-level annotations by introducing CEDANet, a human-in-the-loop, weakly-supervised domain-adaptation framework. It combines citizen-provided video-level labels with a Transmission-guided Bayesian backbone and a local contrastive loss to learn robust, domain-invariant features, enforced by two class-aware domain discriminators via a Gradient Reversal Layer. The approach refines pseudo-labels through frame-level selection guided by citizen input and leverages class-specific alignment to bridge synthetic/real source data with real industrial targets, achieving substantial gains in $F_{1}$ and $IoU_{ ext{smoke}}$ and approaching fully supervised performance with far less target-domain annotation. This demonstrates the scalability and cost-efficiency of citizen-science-informed weak supervision for environmental monitoring in data-scarce, high-variability industrial contexts.
Abstract
Industrial smoke segmentation is critical for air-quality monitoring and environmental protection but is often hampered by the high cost and scarcity of pixel-level annotations in real-world settings. We introduce CEDANet, a human-in-the-loop, class-aware domain adaptation framework that uniquely integrates weak, citizen-provided video-level labels with adversarial feature alignment. Specifically, we refine pseudo-labels generated by a source-trained segmentation model using citizen votes, and employ class-specific domain discriminators to transfer rich source-domain representations to the industrial domain. Comprehensive experiments on SMOKE5K and custom IJmond datasets demonstrate that CEDANet achieves an F1-score of 0.414 and a smoke-class IoU of 0.261 with citizen feedback, vastly outperforming the baseline model, which scored 0.083 and 0.043 respectively. This represents a five-fold increase in F1-score and a six-fold increase in smoke-class IoU. Notably, CEDANet with citizen-constrained pseudo-labels achieves performance comparable to the same architecture trained on limited 100 fully annotated images with F1-score of 0.418 and IoU of 0.264, demonstrating its ability to reach small-sampled fully supervised-level accuracy without target-domain annotations. Our research validates the scalability and cost-efficiency of combining citizen science with weakly supervised domain adaptation, offering a practical solution for complex, data-scarce environmental monitoring applications.
