Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming
Shubham Gupta, Isaac Neri Gomez-Sarmiento, Faez Amjed Mezdari, Mirco Ravanelli, Cem Subakan
TL;DR
This work targets humming transcription in Automatic Music Transcription (AMT) by leveraging the HumTrans dataset and addressing ground-truth misalignments. It introduces a CNN-based architecture with Harmonic Stacking and a dynamic programming post-processing step to enforce plausible note transitions, along with a heuristic method to produce more accurate ground-truth onset/offset annotations. The approach achieves state-of-the-art results under both octave-invariant and octave-aware evaluations and demonstrates robustness to note-length and timing variations, aided by post-processing that reads a per-frame note probability into a coherent note stream. The contribution includes not only competitive performance but also a cleaned, higher-quality subset of HumTrans data and publicly available code, enabling broader adoption and extension to polyphonic transcription tasks in the future.
Abstract
We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription
