Table of Contents
Fetching ...

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Sudip Chakrabarty, Pappu Bishwas, Rajdeep Chatterjee, Tathagata Bandyopadhyay, Digonto Biswas, Bibek Howlader

TL;DR

A novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns is introduced, and a Convolutional Neural Network architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset.

Abstract

Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

TL;DR

A novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns is introduced, and a Convolutional Neural Network architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset.

Abstract

Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.
Paper Structure (21 sections, 5 equations, 5 figures, 5 tables)

This paper contains 21 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: t-SNE: A Representation of Class Distributions in the Feature Space for SAS-KIIT Dataset.
  • Figure 2: t-SNE: A Representation of Class Distributions in the Feature Space for UrbanSound8K Dataset.
  • Figure 3: Audio Mixing Process and Label Interactions.
  • Figure 4: Mixed audio is converted to Mel-spectrograms and fed into a CNN with ReLU and max pooling to extract features; dense layers output multilabel predictions, with dusty pink highlighting detected sound classes in an example recording.
  • Figure 5: Log-Scale Representation of Predicted Audio Classes: Orange bars indicate classes detected in the sample audio; Sky-blue bars show other predicted classes with lower probability scores in log scale; The dashed black line (- - -) represents the decision threshold, above which classes are considered present.