Table of Contents
Fetching ...

Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data

Mingyuan Zhou, Xudong Jian, Ye Xia, Zhilu Lai

TL;DR

This work tackles scarce labeling in SHM data anomaly detection by employing self-supervised pre-training on large unlabeled SHM streams, followed by fine-tuning with a small labeled set. It investigates three SSL families—generative, contrastive, and generative-contrastive—and demonstrates that pre-training, particularly with an autoencoder, substantially improves detection performance on two real-bridge datasets. While AE generally yields the best results, some SSL methods exhibit negative transfer under imbalanced conditions, underscoring the importance of data distribution and minority-pattern detection. The approach offers a practical, low-label burden path to preliminary SHM data cleansing and anomaly detection at scale, with code and data intended for public sharing.

Abstract

Structural health monitoring (SHM) has experienced significant advancements in recent decades, accumulating massive monitoring data. Data anomalies inevitably exist in monitoring data, posing significant challenges to their effective utilization. Recently, deep learning has emerged as an efficient and effective approach for anomaly detection in bridge SHM. Despite its progress, many deep learning models require large amounts of labeled data for training. The process of labeling data, however, is labor-intensive, time-consuming, and often impractical for large-scale SHM datasets. To address these challenges, this work explores the use of self-supervised learning (SSL), an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning. The SSL-based framework aims to learn from only a very small quantity of labeled data by fine-tuning, while making the best use of the vast amount of unlabeled SHM data by pre-training. Mainstream SSL methods are compared and validated on the SHM data of two in-service bridges. Comparative analysis demonstrates that SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training, especially given a very limited amount of labeled data. This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.

Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data

TL;DR

This work tackles scarce labeling in SHM data anomaly detection by employing self-supervised pre-training on large unlabeled SHM streams, followed by fine-tuning with a small labeled set. It investigates three SSL families—generative, contrastive, and generative-contrastive—and demonstrates that pre-training, particularly with an autoencoder, substantially improves detection performance on two real-bridge datasets. While AE generally yields the best results, some SSL methods exhibit negative transfer under imbalanced conditions, underscoring the importance of data distribution and minority-pattern detection. The approach offers a practical, low-label burden path to preliminary SHM data cleansing and anomaly detection at scale, with code and data intended for public sharing.

Abstract

Structural health monitoring (SHM) has experienced significant advancements in recent decades, accumulating massive monitoring data. Data anomalies inevitably exist in monitoring data, posing significant challenges to their effective utilization. Recently, deep learning has emerged as an efficient and effective approach for anomaly detection in bridge SHM. Despite its progress, many deep learning models require large amounts of labeled data for training. The process of labeling data, however, is labor-intensive, time-consuming, and often impractical for large-scale SHM datasets. To address these challenges, this work explores the use of self-supervised learning (SSL), an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning. The SSL-based framework aims to learn from only a very small quantity of labeled data by fine-tuning, while making the best use of the vast amount of unlabeled SHM data by pre-training. Mainstream SSL methods are compared and validated on the SHM data of two in-service bridges. Comparative analysis demonstrates that SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training, especially given a very limited amount of labeled data. This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.

Paper Structure

This paper contains 22 sections, 16 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The workflow of the self-supervised learning framework for data anomaly detection. (Details about the IERFH feature are provided in Section \ref{['sec:data_reduction']}).
  • Figure 2: Categories of pretext tasks in self-supervised learning: different pretext tasks aim to train an encoder for learning feature representations without requiring manually annotated labels.
  • Figure 3: Sensor network of the bridge in Case 1 (image credits tang2019convolutional).
  • Figure 4: Sensor network of the bridge in Case 2 (image credits jian2021faulty).
  • Figure 5: Illustration of SHM data patterns in Case 1.
  • ...and 5 more figures