Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data
Mingyuan Zhou, Xudong Jian, Ye Xia, Zhilu Lai
TL;DR
This work tackles scarce labeling in SHM data anomaly detection by employing self-supervised pre-training on large unlabeled SHM streams, followed by fine-tuning with a small labeled set. It investigates three SSL families—generative, contrastive, and generative-contrastive—and demonstrates that pre-training, particularly with an autoencoder, substantially improves detection performance on two real-bridge datasets. While AE generally yields the best results, some SSL methods exhibit negative transfer under imbalanced conditions, underscoring the importance of data distribution and minority-pattern detection. The approach offers a practical, low-label burden path to preliminary SHM data cleansing and anomaly detection at scale, with code and data intended for public sharing.
Abstract
Structural health monitoring (SHM) has experienced significant advancements in recent decades, accumulating massive monitoring data. Data anomalies inevitably exist in monitoring data, posing significant challenges to their effective utilization. Recently, deep learning has emerged as an efficient and effective approach for anomaly detection in bridge SHM. Despite its progress, many deep learning models require large amounts of labeled data for training. The process of labeling data, however, is labor-intensive, time-consuming, and often impractical for large-scale SHM datasets. To address these challenges, this work explores the use of self-supervised learning (SSL), an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning. The SSL-based framework aims to learn from only a very small quantity of labeled data by fine-tuning, while making the best use of the vast amount of unlabeled SHM data by pre-training. Mainstream SSL methods are compared and validated on the SHM data of two in-service bridges. Comparative analysis demonstrates that SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training, especially given a very limited amount of labeled data. This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.
