Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task
Dang Thoai Phan
TL;DR
<3-5 sentence high-level summary>Addresses the lack of direct, side-by-side comparisons between spectrograms and scalograms as inputs for acoustic recognition. The study employs an identical experimental framework, transforming audio with STFT and CWT, feeding the resulting images into CNNs, and evaluating via AUC-ROC on the MIMII dataset. It finds that spectrograms generally outperform scalograms, though scalograms offer advantages for non-stationary signals; scalograms incur substantially higher computational cost. The results highlight practical guidance for feature choice and outline avenues for aligning transform outputs and exploring normalization strategies in future work.
Abstract
Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet Transform. However, there is a notable deficiency in studies that comprehensively discuss the advantages, drawbacks, and performance comparisons of these methods. This paper aims to evaluate the characteristics of these two transforms as input data for acoustic recognition using Convolutional Neural Networks. The performance of the trained models employing both transforms is documented for comparison. Through this analysis, the paper elucidates the advantages and limitations of each method, provides insights into their respective application scenarios, and identifies potential directions for further research.
