Table of Contents
Fetching ...

Semi-Supervised Pipe Video Temporal Defect Interval Localization

Zhu Huang, Gang Pan, Chao Kang, YaoZhi Lv

TL;DR

This paper tackles the challenge of temporal defect interval localization in sewer pipe CCTV videos, where time-interval annotations are scarce and pipe-video dynamics differ from standard TAL tasks. It introduces PipeSPO, a semi-supervised framework that combines an unsupervised contrastive pretext task for learning robust video sequence representations with a semi-supervised, multi-prototype temporal localization stage guided by monocular visual odometry. Key contributions include a clustering-based multi-prototype memory, a prototype-aware decoder, a visual odometry attention module, and the use of 3D-DWT dynamic features to capture temporal textures, all integrating to produce superior interval localization. Empirical results on real-world datasets show PipeSPO achieving an average AP of 41.89% across IoU thresholds 0.1–0.7, outperforming state-of-the-art methods by substantial margins and demonstrating practical value for CTDIL in sewer maintenance.

Abstract

In sewer pipe Closed-Circuit Television (CCTV) inspection, accurate temporal defect localization is essential for effective defect classification, detection, segmentation and quantification. Industry standards typically do not require time-interval annotations, even though they are more informative than time-point annotations for defect localization, resulting in additional annotation costs when fully supervised methods are used. Additionally, differences in scene types and camera motion patterns between pipe inspections and Temporal Action Localization (TAL) hinder the effective transfer of point-supervised TAL methods. Therefore, this study introduces a Semi-supervised multi-Prototype-based method incorporating visual Odometry for enhanced attention guidance (PipeSPO). PipeSPO fully leverages unlabeled data through unsupervised pretext tasks and utilizes time-point annotated data with a weakly supervised multi-prototype-based method, relying on visual odometry features to capture camera pose information. Experiments on real-world datasets demonstrate that PipeSPO achieves 41.89% average precision across Intersection over Union (IoU) thresholds of 0.1-0.7, improving by 8.14% over current state-of-the-art methods.

Semi-Supervised Pipe Video Temporal Defect Interval Localization

TL;DR

This paper tackles the challenge of temporal defect interval localization in sewer pipe CCTV videos, where time-interval annotations are scarce and pipe-video dynamics differ from standard TAL tasks. It introduces PipeSPO, a semi-supervised framework that combines an unsupervised contrastive pretext task for learning robust video sequence representations with a semi-supervised, multi-prototype temporal localization stage guided by monocular visual odometry. Key contributions include a clustering-based multi-prototype memory, a prototype-aware decoder, a visual odometry attention module, and the use of 3D-DWT dynamic features to capture temporal textures, all integrating to produce superior interval localization. Empirical results on real-world datasets show PipeSPO achieving an average AP of 41.89% across IoU thresholds 0.1–0.7, outperforming state-of-the-art methods by substantial margins and demonstrating practical value for CTDIL in sewer maintenance.

Abstract

In sewer pipe Closed-Circuit Television (CCTV) inspection, accurate temporal defect localization is essential for effective defect classification, detection, segmentation and quantification. Industry standards typically do not require time-interval annotations, even though they are more informative than time-point annotations for defect localization, resulting in additional annotation costs when fully supervised methods are used. Additionally, differences in scene types and camera motion patterns between pipe inspections and Temporal Action Localization (TAL) hinder the effective transfer of point-supervised TAL methods. Therefore, this study introduces a Semi-supervised multi-Prototype-based method incorporating visual Odometry for enhanced attention guidance (PipeSPO). PipeSPO fully leverages unlabeled data through unsupervised pretext tasks and utilizes time-point annotated data with a weakly supervised multi-prototype-based method, relying on visual odometry features to capture camera pose information. Experiments on real-world datasets demonstrate that PipeSPO achieves 41.89% average precision across Intersection over Union (IoU) thresholds of 0.1-0.7, improving by 8.14% over current state-of-the-art methods.
Paper Structure (28 sections, 16 equations, 3 figures, 6 tables)

This paper contains 28 sections, 16 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Illustration of the CCTV robot used in sewer pipe inspection. When no pipe defect is observed, the robot moves in a straight line with the camera facing forward, and upon observing a pipe defect, it typically stops and rotates the camera to align with the defect, indicating a strong relationship between camera pose changes and the occurrence of pipe defects. Leveraging this prior knowledge, visual odometry features can provide attention guidance to the model, thus enhancing its ability to locate pipe defects.
  • Figure 2: Distribution of video lengths. Only the test set contains time-interval level annotations, while the rest contains only time-point level annotations.
  • Figure 3: PipeSPO architecture. The network modules with the same color share weights. PipeSPO consists of two stages: the first stage is an unsupervised pretext task that trains a video frame sequence encoder using unlabeled videos; the second stage is a semi-supervised temporal defect interval localization, utilizing the clustering-based multi-prototype memory and prototype perception module, and incorporating camera pose information to guide the network. For a detailed introduction, see section \ref{['sec:pipespo_architecture']}.