Table of Contents
Fetching ...

Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment

S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

TL;DR

This work tackles the data-enabled bottleneck of automatic music transcription (AMT) by proposing a controllable, scratch-built dataset generation pipeline for monophonic guitar. It combines predefined note sequences with hidden Markov model (HMM) forced Viterbi alignment to produce time-aligned transcriptions, requiring minimal manual labeling yet achieving onset accuracy around 5 ms on average (max ~10 ms). The result is an acoustic plectrum guitar dataset with audio files and corresponding label files suitable for AMT training, demonstrating a scalable path toward larger, more realistic datasets. The method prioritizes bias control, timbre realism, and practicality, providing a foundation for extending to other instruments and more complex (polyphonic) transcription tasks.

Abstract

Datasets are essential for any machine learning task. Automatic Music Transcription (AMT) is one such task, where considerable amount of data is required depending on the way the solution is achieved. Considering the fact that a music dataset, complete with audio and its time-aligned transcriptions would require the effort of people with musical experience, it could be stated that the task becomes even more challenging. Musical experience is required in playing the musical instrument(s), and in annotating and verifying the transcriptions. We propose a method that would help in streamlining this process, making the task of obtaining a dataset from a particular instrument easy and efficient. We use predefined guitar exercises and hidden Markov model(HMM) based forced viterbi alignment to accomplish this. The guitar exercises are designed to be simple. Since the note sequence are already defined, HMM based forced viterbi alignment provides time-aligned transcriptions of these audio files. The onsets of the transcriptions are manually verified and the labels are accurate up to 10ms, averaging at 5ms. The contributions of the proposed work is two fold, i) a well streamlined and efficient method for generating datasets for any instrument, especially monophonic and, ii) an acoustic plectrum guitar dataset containing wave files and transcriptions in the form of label files. This method will aid as a preliminary step towards building concrete datasets for building AMT systems for different instruments.

Development of Large Annotated Music Datasets using HMM-based Forced Viterbi Alignment

TL;DR

This work tackles the data-enabled bottleneck of automatic music transcription (AMT) by proposing a controllable, scratch-built dataset generation pipeline for monophonic guitar. It combines predefined note sequences with hidden Markov model (HMM) forced Viterbi alignment to produce time-aligned transcriptions, requiring minimal manual labeling yet achieving onset accuracy around 5 ms on average (max ~10 ms). The result is an acoustic plectrum guitar dataset with audio files and corresponding label files suitable for AMT training, demonstrating a scalable path toward larger, more realistic datasets. The method prioritizes bias control, timbre realism, and practicality, providing a foundation for extending to other instruments and more complex (polyphonic) transcription tasks.

Abstract

Datasets are essential for any machine learning task. Automatic Music Transcription (AMT) is one such task, where considerable amount of data is required depending on the way the solution is achieved. Considering the fact that a music dataset, complete with audio and its time-aligned transcriptions would require the effort of people with musical experience, it could be stated that the task becomes even more challenging. Musical experience is required in playing the musical instrument(s), and in annotating and verifying the transcriptions. We propose a method that would help in streamlining this process, making the task of obtaining a dataset from a particular instrument easy and efficient. We use predefined guitar exercises and hidden Markov model(HMM) based forced viterbi alignment to accomplish this. The guitar exercises are designed to be simple. Since the note sequence are already defined, HMM based forced viterbi alignment provides time-aligned transcriptions of these audio files. The onsets of the transcriptions are manually verified and the labels are accurate up to 10ms, averaging at 5ms. The contributions of the proposed work is two fold, i) a well streamlined and efficient method for generating datasets for any instrument, especially monophonic and, ii) an acoustic plectrum guitar dataset containing wave files and transcriptions in the form of label files. This method will aid as a preliminary step towards building concrete datasets for building AMT systems for different instruments.
Paper Structure (12 sections, 3 figures, 2 tables)

This paper contains 12 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Generic steps involved in creating a dataset incrementally
  • Figure 2: Count of each note in the dataset
  • Figure 3: Time domain representation of the onset region of a note, showing the actual onset location and onset locations with error of 5 and 10 ms