AI-Generated Music Detection in Broadcast Monitoring
David Lopez-Ayala, Asier Cabello, Pablo Zinemanas, Emilio Molina, Martin Rocamora
TL;DR
The paper tackles the problem of detecting AI-generated music in broadcast environments, where music often appears as brief, speech-masked excerpts unlike full-length streaming tracks. It introduces AI-OpenBMAT, a broadcast-oriented benchmark built by extending OpenBMAT with AI-generated continuations from Suno v3.5 and realistic loudness patterns, totaling 54.9 hours across 3,294 one-minute excerpts. Through three controlled experiments (SNR robustness, duration robustness, and full broadcast evaluation), the authors benchmark a CNN baseline and SpectTTTra variants, showing dramatic performance drops under broadcast-like conditions. The dataset serves as a practical tool to drive development of detectors robust to speech masking and short-duration cues, with clear implications for industrial broadcast monitoring.
Abstract
AI music generators have advanced to the point where their outputs are often indistinguishable from human compositions. While detection methods have emerged, they are typically designed and validated in music streaming contexts with clean, full-length tracks. Broadcast audio, however, poses a different challenge: music appears as short excerpts, often masked by dominant speech, conditions under which existing detectors fail. In this work, we introduce AI-OpenBMAT, the first dataset tailored to broadcast-style AI-music detection. It contains 3,294 one-minute audio excerpts (54.9 hours) that follow the duration patterns and loudness relations of real television audio, combining human-made production music with stylistically matched continuations generated with Suno v3.5. We benchmark a CNN baseline and state-of-the-art SpectTTTra models to assess SNR and duration robustness, and evaluate on a full broadcast scenario. Across all settings, models that excel in streaming scenarios suffer substantial degradation, with F1-scores dropping below 60% when music is in the background or has a short duration. These results highlight speech masking and short music length as critical open challenges for AI music detection, and position AI-OpenBMAT as a benchmark for developing detectors capable of meeting industrial broadcast requirements.
