Table of Contents
Fetching ...

Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings

Iroro Orife

TL;DR

This work demonstrates a novel system designed to retrieve (or rediscover) compelling DJ tools for use live or in the studio using open-source libraries for speech/music activity, music boundary analysis and a Contrastive Language-Audio Pretraining model for zero-shot audio classification.

Abstract

In genres like Hip-Hop, RnB, Reggae, Dancehall and just about every Electronic/Dance/Club style, DJ tools are a special set of audio files curated to heighten the DJ's musical performance and creative mixing choices. In this work we demonstrate an approach to discovering DJ tools in personal music collections. Leveraging open-source libraries for speech/music activity, music boundary analysis and a Contrastive Language-Audio Pretraining (CLAP) model for zero-shot audio classification, we demonstrate a novel system designed to retrieve (or rediscover) compelling DJ tools for use live or in the studio.

Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings

TL;DR

This work demonstrates a novel system designed to retrieve (or rediscover) compelling DJ tools for use live or in the studio using open-source libraries for speech/music activity, music boundary analysis and a Contrastive Language-Audio Pretraining model for zero-shot audio classification.

Abstract

In genres like Hip-Hop, RnB, Reggae, Dancehall and just about every Electronic/Dance/Club style, DJ tools are a special set of audio files curated to heighten the DJ's musical performance and creative mixing choices. In this work we demonstrate an approach to discovering DJ tools in personal music collections. Leveraging open-source libraries for speech/music activity, music boundary analysis and a Contrastive Language-Audio Pretraining (CLAP) model for zero-shot audio classification, we demonstrate a novel system designed to retrieve (or rediscover) compelling DJ tools for use live or in the studio.

Paper Structure

This paper contains 8 sections, 2 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: A 5 minute Ragga Jungle song overlaid with detected speech and music activity, as well as music-structural boundaries
  • Figure :