Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Shubham Gupta; Isaac Neri Gomez-Sarmiento; Faez Amjed Mezdari; Mirco Ravanelli; Cem Subakan

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Shubham Gupta, Isaac Neri Gomez-Sarmiento, Faez Amjed Mezdari, Mirco Ravanelli, Cem Subakan

TL;DR

This work targets humming transcription in Automatic Music Transcription (AMT) by leveraging the HumTrans dataset and addressing ground-truth misalignments. It introduces a CNN-based architecture with Harmonic Stacking and a dynamic programming post-processing step to enforce plausible note transitions, along with a heuristic method to produce more accurate ground-truth onset/offset annotations. The approach achieves state-of-the-art results under both octave-invariant and octave-aware evaluations and demonstrates robustness to note-length and timing variations, aided by post-processing that reads a per-frame note probability into a coherent note stream. The contribution includes not only competitive performance but also a cleaned, higher-quality subset of HumTrans data and publicly available code, enabling broader adoption and extension to polyphonic transcription tasks in the future.

Abstract

We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 4 figures, 2 tables)

This paper contains 16 sections, 3 equations, 4 figures, 2 tables.

Introduction
Evaluation metrics
Dataset Challenges
Octave Aware vs Octave Invariance
Transcription methodology
Better ground truth annotation
Network Design
Training
Inference
Results and discussion
Octave invariant
Octave aware
Discussion
Future Work
Heuristic Algorithm for Better Ground Truth Annotations
...and 1 more sections

Figures (4)

Figure 1: Examples where the heuristic algorithm for onset and offset detection succeeds (top) and fails (bottom).
Figure 2: Harmonic Stacking. Source: balhar10melody
Figure 3: Our model architecture - A minimal version of Spotify's BasicPitch model.
Figure 4: Example inference: blue represents the ground truth and red the inferred cleaned notes.

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

TL;DR

Abstract

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

Authors

TL;DR

Abstract

Table of Contents

Figures (4)