Table of Contents
Fetching ...

Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification

Aditya Sridhar

TL;DR

This work bridges the gap between technical classification tasks and the nuanced, human experience of genre, highlighting cross-genre similarities and distinctiveness, aligning closely with human musical intuition.

Abstract

Music genre classification is a critical component of music recommendation systems, generation algorithms, and cultural analytics. In this work, we present an innovative model for classifying music genres using attention-based temporal signature modeling. By processing spectrogram sequences through Convolutional Neural Networks (CNNs) and multi-head attention layers, our approach captures the most temporally significant moments within each piece, crafting a unique "signature" for genre identification. This temporal focus not only enhances classification accuracy but also reveals insights into genre-specific characteristics that can be intuitively mapped to listener perceptions. Our findings offer potential applications in personalized music recommendation systems by highlighting cross-genre similarities and distinctiveness, aligning closely with human musical intuition. This work bridges the gap between technical classification tasks and the nuanced, human experience of genre.

Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification

TL;DR

This work bridges the gap between technical classification tasks and the nuanced, human experience of genre, highlighting cross-genre similarities and distinctiveness, aligning closely with human musical intuition.

Abstract

Music genre classification is a critical component of music recommendation systems, generation algorithms, and cultural analytics. In this work, we present an innovative model for classifying music genres using attention-based temporal signature modeling. By processing spectrogram sequences through Convolutional Neural Networks (CNNs) and multi-head attention layers, our approach captures the most temporally significant moments within each piece, crafting a unique "signature" for genre identification. This temporal focus not only enhances classification accuracy but also reveals insights into genre-specific characteristics that can be intuitively mapped to listener perceptions. Our findings offer potential applications in personalized music recommendation systems by highlighting cross-genre similarities and distinctiveness, aligning closely with human musical intuition. This work bridges the gap between technical classification tasks and the nuanced, human experience of genre.

Paper Structure

This paper contains 20 sections.