Table of Contents
Fetching ...

SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers

Guangke Chen, Yedi Zhang, Fu Song, Ting Wang, Xiaoning Du, Yang Liu

TL;DR

This work tackles illegal SVC-based song covers by introducing SongBsAb, a proactive dual-prevention framework that perturbs singing voices to disrupt both target identity and source lyrics while preserving song quality via backing-track psychoacoustic masking. It relies on a combination of gender-transformation and high/low hierarchy multi-target losses, frame-level interaction reduction, and encoder ensembles to achieve transferability to unseen SVC models. Comprehensive evaluations across English and Chinese datasets show high prevention effectiveness (SRR > 97%), robust transferability to unknown encoders, and resilience against over-the-air and adaptive adversaries, complemented by human studies confirming perceptual harmlessness. The approach offers a practical, open-source pathway to mitigating illegal automated song covers and invites further research into broader rights protections and real-world deployment considerations.

Abstract

Singing voice conversion (SVC) automates song covers by converting a source singing voice from a source singer into a new singing voice with the same lyrics and melody as the source, but sounds like being covered by the target singer of some given target singing voices. However, it raises serious concerns about copyright and civil right infringements. We propose SongBsAb, the first proactive approach to tackle SVC-based illegal song covers. SongBsAb adds perturbations to singing voices before releasing them, so that when they are used, the process of SVC will be interfered, leading to unexpected singing voices. Perturbations are carefully crafted to (1) provide a dual prevention, i.e., preventing the singing voice from being used as the source and target singing voice in SVC, by proposing a gender-transformation loss and a high/low hierarchy multi-target loss, respectively; and (2) be harmless, i.e., no side-effect on the enjoyment of protected songs, by refining a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. We also adopt a frame-level interaction reduction-based loss and encoder ensemble to enhance the transferability of SongBsAb to unknown SVC models. We demonstrate the prevention effectiveness, harmlessness, and robustness of SongBsAb on five diverse and promising SVC models, using both English and Chinese datasets, and both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers.

SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers

TL;DR

This work tackles illegal SVC-based song covers by introducing SongBsAb, a proactive dual-prevention framework that perturbs singing voices to disrupt both target identity and source lyrics while preserving song quality via backing-track psychoacoustic masking. It relies on a combination of gender-transformation and high/low hierarchy multi-target losses, frame-level interaction reduction, and encoder ensembles to achieve transferability to unseen SVC models. Comprehensive evaluations across English and Chinese datasets show high prevention effectiveness (SRR > 97%), robust transferability to unknown encoders, and resilience against over-the-air and adaptive adversaries, complemented by human studies confirming perceptual harmlessness. The approach offers a practical, open-source pathway to mitigating illegal automated song covers and invites further research into broader rights protections and real-world deployment considerations.

Abstract

Singing voice conversion (SVC) automates song covers by converting a source singing voice from a source singer into a new singing voice with the same lyrics and melody as the source, but sounds like being covered by the target singer of some given target singing voices. However, it raises serious concerns about copyright and civil right infringements. We propose SongBsAb, the first proactive approach to tackle SVC-based illegal song covers. SongBsAb adds perturbations to singing voices before releasing them, so that when they are used, the process of SVC will be interfered, leading to unexpected singing voices. Perturbations are carefully crafted to (1) provide a dual prevention, i.e., preventing the singing voice from being used as the source and target singing voice in SVC, by proposing a gender-transformation loss and a high/low hierarchy multi-target loss, respectively; and (2) be harmless, i.e., no side-effect on the enjoyment of protected songs, by refining a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. We also adopt a frame-level interaction reduction-based loss and encoder ensemble to enhance the transferability of SongBsAb to unknown SVC models. We demonstrate the prevention effectiveness, harmlessness, and robustness of SongBsAb on five diverse and promising SVC models, using both English and Chinese datasets, and both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers.
Paper Structure (41 sections, 9 equations, 24 figures, 10 tables)

This paper contains 41 sections, 9 equations, 24 figures, 10 tables.

Figures (24)

  • Figure 1: Mainstream Singing Voice Conversion Systems.
  • Figure 2: Overview of SongBsAb. Song owners apply SongBsAb to singing voices ($x_1$ and $x_2$) and obtain the protected counterparts ($\tilde{x}_1$ and $\tilde{x}_2$) to prevent them from being used as source or target singing voices (dual prevention), by disrupting lyrics and singer identity in SVC-covered singing voices, respectively.
  • Figure 3: Overview of the methodology of SongBsAb
  • Figure 4: Transferability of SongBsAb.
  • Figure 5: Comparison of transferability for identity disruption in terms of identity similarity. AttackVC uses a single encoder, and Best AttackVC means the best result among all encoders.
  • ...and 19 more figures