Probabilistic Analysis of Copyright Disputes and Generative AI Safety
Hiroaki Chiba-Okabe
TL;DR
This paper develops a probabilistic formalism for evaluating copyright infringement disputes and the safety of generative AI. It formalizes evidentiary principles as binary random variables and conditional probabilities, enabling a rigorous analysis of the inverse ratio rule and its applicability under clearly specified monotonicity assumptions. It then assesses Near Access-Freeness (NAF) as a training-data risk-mitigation condition, deriving bounds like $P(Z=z|Access=1) \le e^{\epsilon} P(Z=z|Access=0)$ and linking NAF to inference about access via $A_M$ and $EA_M$ through the function $\Gamma(\epsilon, \delta)$. The results show that while the inverse ratio rule can be justified under natural assumptions and NAF can reduce infringement risk, both approaches face normative concerns and practical limitations, including transparency and retrospective attribution.
Abstract
This paper presents a probabilistic approach to analyzing copyright infringement disputes. Evidentiary principles shaped by case law are formalized in probabilistic terms, and the ``inverse ratio rule'' -- a controversial legal doctrine adopted by some courts -- is examined. Although this rule has faced significant criticism, a formal proof demonstrates its validity, provided it is properly defined. The probabilistic approach is further employed to study the copyright safety of generative AI. Specifically, the Near Access-Free (NAF) condition, previously proposed as a strategy for mitigating the heightened copyright infringement risks of generative AI, is evaluated. The analysis reveals limitations in its justifiability and efficacy.
