Computational Copyright: Towards A Royalty Model for Music Generative AI
Junwei Deng, Xirui Jiang, Shiyuan Zhang, Shichang Zhang, Himabindu Lakkaraju, Ruijiang Gao, Chris Donahue, Jiaqi W. Ma
TL;DR
The paper tackles how to sustain creative incentives in music generated by AI by proposing Generative Content ID, a causal attribution framework that ties AI outputs to their training data via Training Data Attribution (TDA). It formalizes leave-one-out influence as a counterfactual utility difference, and provides scalable gradient-based TDA methods (TRAK, LoGra) to approximate true causality without retraining. Empirical analysis on MAESTRO and TheoryTab shows TDA closely tracks retraining-based attribution, while revealing that legal proxies based on similarity imperfectly capture data influence, particularly for less obvious contributors. The authors also simulate economic outcomes under different royalty schemes, demonstrating that distribution mechanisms can significantly shape income inequality and platform governance. Overall, the work offers a principled, scalable foundation for royalty-based governance of music generative AI and highlights regulatory implications for fair compensation of data contributors.
Abstract
The rapid rise of generative AI has intensified copyright and economic tensions in creative industries, particularly in music. Current approaches addressing this challenge often focus on preventing infringement or establishing one-time licensing, which fail to provide the sustainable, recurring economic incentives necessary to maintain creative ecosystems. To address this gap, we propose Generative Content ID, a framework for scalable and faithful royalty attribution in music generative AI. Adapting the idea of YouTube's Content ID, it attributes the value of AI-generated music back to the specific training content that causally influenced its generation, a process we term as causal attribution. However, naively quantifying the causal influence requires counterfactually retraining the model on subsets of training data, which is infeasible. We address this challenge using efficient Training Data Attribution (TDA) methods to approximate causal attribution at scale. We further conduct empirical analysis of the framework on public and proprietary datasets. First, we demonstrate that the scalable TDA methods provide a faithful approximation of the "gold-standard" but costly retraining-based causal attribution, showing the feasibility of the proposed royalty framework. Second, we investigate the relationship between the perceived similarity employed by legal practices and our causal attribution reflecting the true AI training mechanics. We find that while perceived similarity can capture the most influential samples, it fails to account for the broader data contribution that drives model utility, suggesting similarity-based legal proxies are ill-suited for royalty distribution. Overall, this work provides a principled and operational foundation for royalty-based economic governance of music generative AI.
