Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Dang Thoai Phan

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Dang Thoai Phan

TL;DR

<3-5 sentence high-level summary>Addresses the lack of direct, side-by-side comparisons between spectrograms and scalograms as inputs for acoustic recognition. The study employs an identical experimental framework, transforming audio with STFT and CWT, feeding the resulting images into CNNs, and evaluating via AUC-ROC on the MIMII dataset. It finds that spectrograms generally outperform scalograms, though scalograms offer advantages for non-stationary signals; scalograms incur substantially higher computational cost. The results highlight practical guidance for feature choice and outline avenues for aligning transform outputs and exploring normalization strategies in future work.

Abstract

Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet Transform. However, there is a notable deficiency in studies that comprehensively discuss the advantages, drawbacks, and performance comparisons of these methods. This paper aims to evaluate the characteristics of these two transforms as input data for acoustic recognition using Convolutional Neural Networks. The performance of the trained models employing both transforms is documented for comparison. Through this analysis, the paper elucidates the advantages and limitations of each method, provides insights into their respective application scenarios, and identifies potential directions for further research.

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

TL;DR

Abstract

Paper Structure (23 sections, 6 equations, 6 figures, 5 tables)

This paper contains 23 sections, 6 equations, 6 figures, 5 tables.

Introduction
Theoretical foundation
Short-time Fourier transform
Wavelet transform
Time and frequency resolution of transforms
Uncertainty principle
Multiresolution
Experiment
Workflow
MIMII audio dataset
Audio normalization
Implementation of Short-Time Fourier Transform
Implementation of Continuous Wavelet Transform
Implementation of Convolutional Neural Networks
Benchmarking and performance evaluation
...and 8 more sections

Figures (6)

Figure 1: Visualization of spectrogram (a) and scalogram (b)
Figure 2: Time and frequency resolution of spectrogram and scalogram
Figure 3: Experimental Workflow
Figure 4: Effect of normalization technique
Figure 5: Performance benchmarking
...and 1 more figures

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

TL;DR

Abstract

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors

TL;DR

Abstract

Table of Contents

Figures (6)