Table of Contents
Fetching ...

SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification

Zuoyong Li, Qinghua Lin, Haoyi Fan, Tiesong Zhao, David Zhang

TL;DR

This work tackles industrial accident video classification under limited labeled data by proposing SIAVC, a semi-supervised framework that combines a Super Augmentation Block (SAB) and a Video Cross-set Augmentation Module (VCAM). SAB re-augments well-learned strongly augmented samples using Gaussian noise and random masking guided by historical losses, while VCAM expands training data by interpolating high-confidence unlabeled samples with labeled ones to generate diverse pseudo-labels. The authors introduce the ECA9 dataset for hub-level express center accidents and demonstrate that SIAVC outperforms state-of-the-art semi-supervised methods on both ECA9 and Fire Detection benchmarks, with ablations confirming the benefits of SAB and VCAM. They also provide detailed implementation and evaluation, and plan to release the code and dataset to advance industrial safety analytics. Overall, SIAVC offers a practical, high-performance approach to semi-supervised video accident classification with real-world applicability and impact.

Abstract

Semi-supervised learning suffers from the imbalance of labeled and unlabeled training data in the video surveillance scenario. In this paper, we propose a new semi-supervised learning method called SIAVC for industrial accident video classification. Specifically, we design a video augmentation module called the Super Augmentation Block (SAB). SAB adds Gaussian noise and randomly masks video frames according to historical loss on the unlabeled data for model optimization. Then, we propose a Video Cross-set Augmentation Module (VCAM) to generate diverse pseudo-label samples from the high-confidence unlabeled samples, which alleviates the mismatch of sampling experience and provides high-quality training data. Additionally, we construct a new industrial accident surveillance video dataset with frame-level annotation, namely ECA9, to evaluate our proposed method. Compared with the state-of-the-art semi-supervised learning based methods, SIAVC demonstrates outstanding video classification performance, achieving 88.76\% and 89.13\% accuracy on ECA9 and Fire Detection datasets, respectively. The source code and the constructed dataset ECA9 will be released in \url{https://github.com/AlchemyEmperor/SIAVC}.

SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification

TL;DR

This work tackles industrial accident video classification under limited labeled data by proposing SIAVC, a semi-supervised framework that combines a Super Augmentation Block (SAB) and a Video Cross-set Augmentation Module (VCAM). SAB re-augments well-learned strongly augmented samples using Gaussian noise and random masking guided by historical losses, while VCAM expands training data by interpolating high-confidence unlabeled samples with labeled ones to generate diverse pseudo-labels. The authors introduce the ECA9 dataset for hub-level express center accidents and demonstrate that SIAVC outperforms state-of-the-art semi-supervised methods on both ECA9 and Fire Detection benchmarks, with ablations confirming the benefits of SAB and VCAM. They also provide detailed implementation and evaluation, and plan to release the code and dataset to advance industrial safety analytics. Overall, SIAVC offers a practical, high-performance approach to semi-supervised video accident classification with real-world applicability and impact.

Abstract

Semi-supervised learning suffers from the imbalance of labeled and unlabeled training data in the video surveillance scenario. In this paper, we propose a new semi-supervised learning method called SIAVC for industrial accident video classification. Specifically, we design a video augmentation module called the Super Augmentation Block (SAB). SAB adds Gaussian noise and randomly masks video frames according to historical loss on the unlabeled data for model optimization. Then, we propose a Video Cross-set Augmentation Module (VCAM) to generate diverse pseudo-label samples from the high-confidence unlabeled samples, which alleviates the mismatch of sampling experience and provides high-quality training data. Additionally, we construct a new industrial accident surveillance video dataset with frame-level annotation, namely ECA9, to evaluate our proposed method. Compared with the state-of-the-art semi-supervised learning based methods, SIAVC demonstrates outstanding video classification performance, achieving 88.76\% and 89.13\% accuracy on ECA9 and Fire Detection datasets, respectively. The source code and the constructed dataset ECA9 will be released in \url{https://github.com/AlchemyEmperor/SIAVC}.
Paper Structure (18 sections, 13 equations, 9 figures, 6 tables)

This paper contains 18 sections, 13 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Motivation of the proposed method. (a) The model's high confidence with strongly augmented samples results in a consistency loss close to zero. (b) Interpolation between labeled and unlabeled data to generate pseudo-label data.
  • Figure 2: Overview of SIAVC. We first augment unlabeled samples to obtain their weakly augmented and strongly augmented counterparts. Then, we use VCAM to interpolate between unlabeled and labeled samples that generate pseudo-label samples. After cube embedding and sending to the encoder, the classifier and discriminator outputs predictions of these samples. Next, we compute consistency and fairness loss for predictions on strongly and weakly augmented samples. We update classification loss for labeled samples and adversarial loss for pseudo-label samples. Besides, we use SAB to re-augment these strongly augmented samples according to historical loss and consistency loss for the next iteration.
  • Figure 3: SAB adds Gaussian noise and applies random masking to strongly augmented samples when the consistency loss is lower than a threshold computed based on historical losses.
  • Figure 4: VCAM uses high-confidence unlabeled samples as labeled samples for interpolation in videos, generating more diverse pseudo-label samples.
  • Figure 5: Express Center Accidents 9 dataset construction process.
  • ...and 4 more figures