Table of Contents
Fetching ...

Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Sanjay Natesan, Homayoon Beigi

TL;DR

The paper tackles Carnatic raga identification from audio signals, addressing gamakas and shruti variation that challenge traditional pitch-based approaches. It proposes a hybrid TDNN-LSTM pipeline with an attention mechanism that emphasizes relative frequency changes, coupled with custom triangular filter-bank features to better capture Carnatic melodic detail. Evaluated on a dataset of $676$ recordings across $172$ ragas, the approach achieves a validation accuracy of $95.31\%$ (peaking at $96.12\%$), demonstrating scalability beyond Melakarta-focused studies. The work advances computational musicology by reducing preprocessing and handling noisy, real-world recordings, with plans to broaden ragas, shrutis, and noise conditions.

Abstract

Large scale machine learning-based Raga identification continues to be a nontrivial issue in the computational aspects behind Carnatic music. Each raga consists of many unique and intrinsic melodic patterns that can be used to easily identify them from others. These ragas can also then be used to cluster songs within the same raga, as well as identify songs in other closely related ragas. In this case, the input sound is analyzed using a combination of steps including using a Discrete Fourier transformation and using Triangular Filtering to create custom bins of possible notes, extracting features from the presence of particular notes or lack thereof. Using a combination of Neural Networks including 1D Convolutional Neural Networks conventionally known as Time-Delay Neural Networks) and Long Short-Term Memory (LSTM), which are a form of Recurrent Neural Networks, the backbone of the classification strategy to build the model can be created. In addition, to help with variations in shruti, a long-time attention-based mechanism will be implemented to determine the relative changes in frequency rather than the absolute differences. This will provide a much more meaningful data point when training audio clips in different shrutis. To evaluate the accuracy of the classifier, a dataset of 676 recordings is used. The songs are distributed across the list of ragas. The goal of this program is to be able to effectively and efficiently label a much wider range of audio clips in more shrutis, ragas, and with more background noise.

Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

TL;DR

The paper tackles Carnatic raga identification from audio signals, addressing gamakas and shruti variation that challenge traditional pitch-based approaches. It proposes a hybrid TDNN-LSTM pipeline with an attention mechanism that emphasizes relative frequency changes, coupled with custom triangular filter-bank features to better capture Carnatic melodic detail. Evaluated on a dataset of recordings across ragas, the approach achieves a validation accuracy of (peaking at ), demonstrating scalability beyond Melakarta-focused studies. The work advances computational musicology by reducing preprocessing and handling noisy, real-world recordings, with plans to broaden ragas, shrutis, and noise conditions.

Abstract

Large scale machine learning-based Raga identification continues to be a nontrivial issue in the computational aspects behind Carnatic music. Each raga consists of many unique and intrinsic melodic patterns that can be used to easily identify them from others. These ragas can also then be used to cluster songs within the same raga, as well as identify songs in other closely related ragas. In this case, the input sound is analyzed using a combination of steps including using a Discrete Fourier transformation and using Triangular Filtering to create custom bins of possible notes, extracting features from the presence of particular notes or lack thereof. Using a combination of Neural Networks including 1D Convolutional Neural Networks conventionally known as Time-Delay Neural Networks) and Long Short-Term Memory (LSTM), which are a form of Recurrent Neural Networks, the backbone of the classification strategy to build the model can be created. In addition, to help with variations in shruti, a long-time attention-based mechanism will be implemented to determine the relative changes in frequency rather than the absolute differences. This will provide a much more meaningful data point when training audio clips in different shrutis. To evaluate the accuracy of the classifier, a dataset of 676 recordings is used. The songs are distributed across the list of ragas. The goal of this program is to be able to effectively and efficiently label a much wider range of audio clips in more shrutis, ragas, and with more background noise.
Paper Structure (11 sections, 4 figures, 2 tables)

This paper contains 11 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: System Architecture
  • Figure 2: Model Architecture and Weights
  • Figure 3: Model Training and Validation Loss
  • Figure 4: Model Training and Validation Accuracy