Table of Contents
Fetching ...

Global AI Governance Overview: Understanding Regulatory Requirements Across Global Jurisdictions

Mariia Kyrychenko, Mykyta Mudryi, Markiyan Chaklosh

TL;DR

This paper maps the global AI governance landscape with a focus on training-data regulation, highlighting gaps in enforcement and preprocessing controls. It analyzes EU, US, and Asia-Pacific regimes, detailing specific obligations, penalties, and cross-jurisdictional frictions, and proposes a multilayer pre-training data filtering pipeline to shift protection upstream. Key contributions include a comprehensive regulatory inventory, identification of enforcement gaps, and the integration of ISO, NIST RMF, and AI-TRiSM frameworks as practical cross-border governance aids. The work underscores the need for real-time compliance monitoring, harmonized standards, and governance models adaptable to autonomous AI systems within a rapidly evolving regulatory ecosystem.

Abstract

The rapid advancement of general-purpose AI models has increased concerns about copyright infringement in training data, yet current regulatory frameworks remain predominantly reactive rather than proactive. This paper examines the regulatory landscape of AI training data governance in major jurisdictions, including the EU, the United States, and the Asia-Pacific region. It also identifies critical gaps in enforcement mechanisms that threaten both creator rights and the sustainability of AI development. Through analysis of major cases we identified critical gaps in pre-training data filtering. Existing solutions such as transparency tools, perceptual hashing, and access control mechanisms address only specific aspects of the problem and cannot prevent initial copyright violations. We identify two fundamental challenges: pre-training license collection and content filtering, which faces the impossibility of comprehensive copyright management at scale, and verification mechanisms, which lack tools to confirm filtering prevented infringement. We propose a multilayered filtering pipeline that combines access control, content verification, machine learning classifiers, and continuous database cross-referencing to shift copyright protection from post-training detection to pre-training prevention. This approach offers a pathway toward protecting creator rights while enabling continued AI innovation.

Global AI Governance Overview: Understanding Regulatory Requirements Across Global Jurisdictions

TL;DR

This paper maps the global AI governance landscape with a focus on training-data regulation, highlighting gaps in enforcement and preprocessing controls. It analyzes EU, US, and Asia-Pacific regimes, detailing specific obligations, penalties, and cross-jurisdictional frictions, and proposes a multilayer pre-training data filtering pipeline to shift protection upstream. Key contributions include a comprehensive regulatory inventory, identification of enforcement gaps, and the integration of ISO, NIST RMF, and AI-TRiSM frameworks as practical cross-border governance aids. The work underscores the need for real-time compliance monitoring, harmonized standards, and governance models adaptable to autonomous AI systems within a rapidly evolving regulatory ecosystem.

Abstract

The rapid advancement of general-purpose AI models has increased concerns about copyright infringement in training data, yet current regulatory frameworks remain predominantly reactive rather than proactive. This paper examines the regulatory landscape of AI training data governance in major jurisdictions, including the EU, the United States, and the Asia-Pacific region. It also identifies critical gaps in enforcement mechanisms that threaten both creator rights and the sustainability of AI development. Through analysis of major cases we identified critical gaps in pre-training data filtering. Existing solutions such as transparency tools, perceptual hashing, and access control mechanisms address only specific aspects of the problem and cannot prevent initial copyright violations. We identify two fundamental challenges: pre-training license collection and content filtering, which faces the impossibility of comprehensive copyright management at scale, and verification mechanisms, which lack tools to confirm filtering prevented infringement. We propose a multilayered filtering pipeline that combines access control, content verification, machine learning classifiers, and continuous database cross-referencing to shift copyright protection from post-training detection to pre-training prevention. This approach offers a pathway toward protecting creator rights while enabling continued AI innovation.

Paper Structure

This paper contains 32 sections, 1 table.