Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy
Sumit Kumar, Suraj Jaiswal, Parampreet Singh, Vipul Arora
TL;DR
The paper tackles automatic detection of singing mistakes in Indian Art Music by introducing the M3 dataset of synchronized teacher–learner recordings with frame-level annotations for pitch and amplitude errors. It benchmarks rule-based and deep learning approaches (CNN, CRNN, TCN) under a collar-based evaluation framework, demonstrating that learning-based methods outperform baselines and that temporal models (TCN) capture error continuity effectively. A systematic analysis across data splits and cross-teacher settings reveals both generalizable patterns and teacher-specific annotation tolerances, guiding pedagogy-focused feedback. The work provides a publicly available dataset, models, and evaluation methodology that can inform practical, interpretable, and real-time feedback tools in music education.
Abstract
The advancement of machine learning in audio analysis has opened new possibilities for technology-enhanced music education. This paper introduces a framework for automatic singing mistake detection in the context of music pedagogy, supported by a newly curated dataset. The dataset comprises synchronized teacher learner vocal recordings, with annotations marking different types of mistakes made by learners. Using this dataset, we develop different deep learning models for mistake detection and benchmark them. To compare the efficacy of mistake detection systems, a new evaluation methodology is proposed. Experiments indicate that the proposed learning-based methods are superior to rule-based methods. A systematic study of errors and a cross-teacher study reveal insights into music pedagogy that can be utilised for various music applications. This work sets out new directions of research in music pedagogy. The codes and dataset are publicly available.
