Table of Contents
Fetching ...

Serial-OE: Anomalous sound detection based on serial method with outlier exposure capable of using small amounts of anomalous data for training

Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda

TL;DR

Serial-OE tackles ASD by enabling training with small amounts of anomalous data through an outlier-exposure framework that couples a per-type DNN feature extractor with per-ID GMM detectors. By using normal and pseudo-anomalous data (and optionally real anomalous data), it achieves competitive or superior ASD performance on DCASE2020 Task2 while remaining robust to data contamination and capable of operating without machine IDs. The method leverages Mixup, pretraining on ImageNet, and a norm-based type loss to shape a discriminative yet generative feature space, with a novel aggregation strategy to capture both stationary and non-stationary anomalies. The results suggest practical benefits for real-world ASD systems, including improved performance with minimal anomalous-data and resilience under data imperfections, though further work is needed for domain shift and edge deployment.

Abstract

We introduce Serial-OE, a new approach to anomalous sound detection (ASD) that leverages small amounts of anomalous data to improve the performance. Conventional ASD methods rely primarily on the modeling of normal data, due to the cost of collecting anomalous data from various possible types of equipment breakdowns. Our method improves upon existing ASD systems by implementing an outlier exposure framework that utilizes normal and pseudo-anomalous data for training, with the capability to also use small amounts of real anomalous data. A comprehensive evaluation using the DCASE2020 Task2 dataset shows that our method outperforms state-of-the-art ASD models. We also investigate the impact on performance of using a small amount of anomalous data during training, of using data without machine ID information, and of using contaminated training data. Our experimental results reveal the potential of using a very limited amount of anomalous data during training to address the limitations of existing methods using only normal data for training due to the scarcity of anomalous data. This study contributes to the field by presenting a method that can be dynamically adapted to include anomalous data during the operational phase of an ASD system, paving the way for more accurate ASD.

Serial-OE: Anomalous sound detection based on serial method with outlier exposure capable of using small amounts of anomalous data for training

TL;DR

Serial-OE tackles ASD by enabling training with small amounts of anomalous data through an outlier-exposure framework that couples a per-type DNN feature extractor with per-ID GMM detectors. By using normal and pseudo-anomalous data (and optionally real anomalous data), it achieves competitive or superior ASD performance on DCASE2020 Task2 while remaining robust to data contamination and capable of operating without machine IDs. The method leverages Mixup, pretraining on ImageNet, and a norm-based type loss to shape a discriminative yet generative feature space, with a novel aggregation strategy to capture both stationary and non-stationary anomalies. The results suggest practical benefits for real-world ASD systems, including improved performance with minimal anomalous-data and resilience under data imperfections, though further work is needed for domain shift and edge deployment.

Abstract

We introduce Serial-OE, a new approach to anomalous sound detection (ASD) that leverages small amounts of anomalous data to improve the performance. Conventional ASD methods rely primarily on the modeling of normal data, due to the cost of collecting anomalous data from various possible types of equipment breakdowns. Our method improves upon existing ASD systems by implementing an outlier exposure framework that utilizes normal and pseudo-anomalous data for training, with the capability to also use small amounts of real anomalous data. A comprehensive evaluation using the DCASE2020 Task2 dataset shows that our method outperforms state-of-the-art ASD models. We also investigate the impact on performance of using a small amount of anomalous data during training, of using data without machine ID information, and of using contaminated training data. Our experimental results reveal the potential of using a very limited amount of anomalous data during training to address the limitations of existing methods using only normal data for training due to the scarcity of anomalous data. This study contributes to the field by presenting a method that can be dynamically adapted to include anomalous data during the operational phase of an ASD system, paving the way for more accurate ASD.

Paper Structure

This paper contains 21 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of the proposed Serial-OE method. The gray areas represent the training phase, while the white area represents the inference phase. This example shows how to train the ASD system to detect anomalies in the sound produced by machine type Fan. Section 1 shows the training of feature extractor $f$, which is trained by applying two loss functions to the features extracted from a mixture of normal and pseudo-anomalous sounds, as sampled by a batch sampler in mini-batches. Section 2 shows the training of anomaly detector $h$. Since a different anomaly detector $h$ is trained for each machine ID, here we train the anomaly detector using the features of normal audio from Fan ID 0 obtained from pretrained feature extractor $f$. Section 3 shows the inference process used to obtain the anomaly score, which is computed by dividing the test sample into multiple chunks, each of which is fed into pretrained feature extractor $f$ and trained anomaly detector $h$, with the resulting anomaly scores aggregated to obtain the final score.
  • Figure 2: Feature space obtained when using a feature extractor for machine type ‘Fan’. The symbols $\circ$ and $\bigtriangleup$ represent the normal and pseudo-anomalous data used for training, respectively. The pseudo-anomalous data are distributed near the origin ($i.e.$, center) of the hypersphere after performing binary classification based on the norm of $\mathcal{L}_\mathrm{type}$, while the normal data are distributed farther from the origin. Data for each machine ID is distributed into separate clusters using $\mathcal{L}_\mathrm{id}$. For the purpose of illustrating our process when using the obtained feature space, normal data for Fan ID 0 are represented with a red $\circ$, while anomalous data for Fan ID 0 are denoted by $\times$. The black dashed circles represent the areas where anomalous data are distributed. It is assumed that anomalous data for Fan ID 0 will cluster around the normal data in the feature space if its characteristics are similar to the normal data ($e.g$. if anomalous data is caused by a slight scratch), and around the pseudo-anomalous data cluster if its characteristics significantly differ from the normal data, suggesting a possible breakdown. The proposed method detects anomalous data by modeling the normal data for Fan ID 0 using a GMM, in relation to such a feature space.
  • Figure 3: Visualizations of data for machine type Fan using t-SNE, when varying loss function $\mathcal{L}_\mathrm{id}$ of the proposed method. (a) Feature space distribution obtained using BCE. (b) Feature space distribution obtained using cross-entropy. (c) Feature space distribution obtained using SCAdaCos Wilkinghoff2021a. (d) Feature space distribution obtained using ArcFace ArcFace. The symbols $\circ$ and $\times$ denote normal and anomalous sounds, respectively. The normal and anomalous feature space distributions for machine ID 0 are encircled by red dotted lines.
  • Figure 4: Relationship between the ratio of anomalous to normal data used during training and ASD performance, when using two different performance evaluation metrics: (a) aAUC [%] and (b) mAUC [%]. Error bars represent the standard error obtained from five calculations with different seeds.
  • Figure 5: Relationship between the ratio of anomalous to normal data used during training and ASD performance when the anomalous data was contaminated with various amounts of normal data, using two different evaluation metrics: (a) aAUC [%], and (b) mAUC [%]. Error bars represent the standard error obtained from five calculations with different seeds.