Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking
Antonin Gagnere, Slim Essid, Geoffroy Peeters
TL;DR
The paper tackles ambiguity in beat and downbeat labeling by introducing Knowledge-Driven Multi-Hypothesis Learning (KD-MHL) to guide contrastive self-supervised pre-training with multiple domain-informed hypotheses, defined as $K = \\sum_{\\omega \\in \\Omega} \\omega$. It defines an encoder with multiple projection heads, a scoring function for hypothesis compatibility, and a selector that retains the top hypotheses to form the SSL loss; the approach is instantiated for musical rhythm analysis with PLP-based hypotheses, and a new self-training variant is explored. On beat and downbeat tracking benchmarks, KD-MHL achieves state-of-the-art results after pre-training and fine-tuning, often surpassing prior methods by notable margins; and the self-training variant attains additional gains on most datasets. The work demonstrates that embedding domain knowledge into SSL and leveraging multiple plausible interpretations can substantially improve MIR representations and downstream rhythm tasks.
Abstract
Ambiguities in data and problem constraints can lead to diverse, equally plausible outcomes for a machine learning task. In beat and downbeat tracking, for instance, different listeners may adopt various rhythmic interpretations, none of which would necessarily be incorrect. To address this, we propose a contrastive self-supervised pre-training approach that leverages multiple hypotheses about possible positive samples in the data. Our model is trained to learn representations compatible with different such hypotheses, which are selected with a knowledge-based scoring function to retain the most plausible ones. When fine-tuned on labeled data, our model outperforms existing methods on standard benchmarks, showcasing the advantages of integrating domain knowledge with multi-hypothesis selection in music representation learning in particular.
