Table of Contents
Fetching ...

Timbre Difference Capturing in Anomalous Sound Detection

Tomoya Nishida, Harsh Purohit, Kota Dohi, Takashi Endo, Yohei Kawaguchi

TL;DR

A framework that explains differences in predefined timbre attributes instead of using free-form text captions instead of using free-form text captions is introduced and a method that jointly conducts anomalous sound detection and timbre difference estimation based on a k-nearest neighbors method in an audio embedding space is developed.

Abstract

This paper proposes a framework of explaining anomalous machine sounds in the context of anomalous sound detection~(ASD). While ASD has been extensively explored, identifying how anomalous sounds differ from normal sounds is also beneficial for machine condition monitoring. However, existing sound difference captioning methods require anomalous sounds for training, which is impractical in typical machine condition monitoring settings where such sounds are unavailable. To solve this issue, we propose a new strategy for explaining anomalous differences that does not require anomalous sounds for training. Specifically, we introduce a framework that explains differences in predefined timbre attributes instead of using free-form text captions. Objective metrics of timbre attributes can be computed using timbral models developed through psycho-acoustical research, enabling the estimation of how and what timbre attributes have changed from normal sounds without training machine learning models. Additionally, to accurately determine timbre differences regardless of variations in normal training data, we developed a method that jointly conducts anomalous sound detection and timbre difference estimation based on a k-nearest neighbors method in an audio embedding space. Evaluation using the MIMII DG dataset demonstrated the effectiveness of the proposed method.

Timbre Difference Capturing in Anomalous Sound Detection

TL;DR

A framework that explains differences in predefined timbre attributes instead of using free-form text captions instead of using free-form text captions is introduced and a method that jointly conducts anomalous sound detection and timbre difference estimation based on a k-nearest neighbors method in an audio embedding space is developed.

Abstract

This paper proposes a framework of explaining anomalous machine sounds in the context of anomalous sound detection~(ASD). While ASD has been extensively explored, identifying how anomalous sounds differ from normal sounds is also beneficial for machine condition monitoring. However, existing sound difference captioning methods require anomalous sounds for training, which is impractical in typical machine condition monitoring settings where such sounds are unavailable. To solve this issue, we propose a new strategy for explaining anomalous differences that does not require anomalous sounds for training. Specifically, we introduce a framework that explains differences in predefined timbre attributes instead of using free-form text captions. Objective metrics of timbre attributes can be computed using timbral models developed through psycho-acoustical research, enabling the estimation of how and what timbre attributes have changed from normal sounds without training machine learning models. Additionally, to accurately determine timbre differences regardless of variations in normal training data, we developed a method that jointly conducts anomalous sound detection and timbre difference estimation based on a k-nearest neighbors method in an audio embedding space. Evaluation using the MIMII DG dataset demonstrated the effectiveness of the proposed method.

Paper Structure

This paper contains 11 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of data distribution in audio feature (or embedding) space and aim of proposed method. When comparing timbre metric value of anomalous sample to whole normal data, differences cannot be determined (red dot in normal timbre distribution). By comparing timbre metric only with neighbor normal samples in feature space, timbre difference can be determined (Target timbre diff.).
  • Figure 2: Overview of proposed joint UASD and timbre difference capturing method in inference phase.
  • Figure 3: MAE of timbre difference capturing (smaller is better). (a) Source domain, (b) Target domain