Exploring Diverse Sounds: Identifying Outliers in a Music Corpus
Le Cai, Sam Ferguson, Gengfa Fang, Hani Alshamrani
TL;DR
The paper addresses the bias toward similarity in music recommendation by defining Genuine music outliers as complete songs that preserve an artist’s structural patterns while diverging in sound, formalized with $\Phi$, $f_{\Phi}$, $C_G$, $\kappa$, and $N_d$ to identify distinct outliers. It builds a labeled dataset from the Million Song Dataset, extracts features such as tempo and loudness, and applies a $\kappa$-means–based detector to classify outliers into five types, including Genuine. Results show 34 Genuine outliers across 29 artists in 320 songs, with detection performing best for single-style artists and limited for multi-style repertoires, highlighting a need for richer features and segmentation. The work motivates incorporating timbre, harmony, and chroma, enforcing structural constraints, and evaluating alternative clustering models to improve discovery and diversification in music recommendation systems.
Abstract
Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argue that not all outliers should be treated as noise, as they can offer interesting perspectives and contribute to a richer understanding of an artist's work. We introduce the concept of 'Genuine' music outliers and provide a definition for them. These genuine outliers can reveal unique aspects of an artist's repertoire and hold the potential to enhance music discovery by exposing listeners to novel and diverse musical experiences.
