Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Sudip Chakrabarty; Pappu Bishwas; Rajdeep Chatterjee; Tathagata Bandyopadhyay; Digonto Biswas; Bibek Howlader

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Sudip Chakrabarty, Pappu Bishwas, Rajdeep Chatterjee, Tathagata Bandyopadhyay, Digonto Biswas, Bibek Howlader

TL;DR

A novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns is introduced, and a Convolutional Neural Network architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset.

Abstract

Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

TL;DR

Abstract

Paper Structure (21 sections, 5 equations, 5 figures, 5 tables)

This paper contains 21 sections, 5 equations, 5 figures, 5 tables.

Introduction
Literature Survey and Related Works
Datasets and Preprocessing
SAS-KIIT Dataset
Dataset Overview
Organization and Characteristics
UrbanSound8K Dataset
Dataset Overview
Organization and Characteristics
Audio Mixing Process
Proposed Methodology
Mel Spectrogram Generation
Computation of MFCC Features
Input Preparation
Model Architecture
...and 6 more sections

Figures (5)

Figure 1: t-SNE: A Representation of Class Distributions in the Feature Space for SAS-KIIT Dataset.
Figure 2: t-SNE: A Representation of Class Distributions in the Feature Space for UrbanSound8K Dataset.
Figure 3: Audio Mixing Process and Label Interactions.
Figure 4: Mixed audio is converted to Mel-spectrograms and fed into a CNN with ReLU and max pooling to extract features; dense layers output multilabel predictions, with dusty pink highlighting detected sound classes in an example recording.
Figure 5: Log-Scale Representation of Predicted Audio Classes: Orange bars indicate classes detected in the sample audio; Sky-blue bars show other predicted classes with lower probability scores in log scale; The dashed black line (- - -) represents the decision threshold, above which classes are considered present.

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

TL;DR

Abstract

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Authors

TL;DR

Abstract

Table of Contents

Figures (5)