Table of Contents
Fetching ...

Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks

Alejandro Tlaie

TL;DR

Insight from Alignment Theory research is leveraged to critically examine the European Union's Artificial Intelligence Act and uncover potential vulnerabilities and areas for improvement in the regulation.

Abstract

This paper leverages insights from Alignment Theory (AT) research, which primarily focuses on the potential pitfalls of technical alignment in Artificial Intelligence, to critically examine the European Union's Artificial Intelligence Act (EU AI Act). In the context of AT research, several key failure modes - such as proxy gaming, goal drift, reward hacking or specification gaming - have been identified. These can arise when AI systems are not properly aligned with their intended objectives. The central logic of this report is: what can we learn if we treat regulatory efforts in the same way as we treat advanced AI systems? As we systematically apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.

Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks

TL;DR

Insight from Alignment Theory research is leveraged to critically examine the European Union's Artificial Intelligence Act and uncover potential vulnerabilities and areas for improvement in the regulation.

Abstract

This paper leverages insights from Alignment Theory (AT) research, which primarily focuses on the potential pitfalls of technical alignment in Artificial Intelligence, to critically examine the European Union's Artificial Intelligence Act (EU AI Act). In the context of AT research, several key failure modes - such as proxy gaming, goal drift, reward hacking or specification gaming - have been identified. These can arise when AI systems are not properly aligned with their intended objectives. The central logic of this report is: what can we learn if we treat regulatory efforts in the same way as we treat advanced AI systems? As we systematically apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.

Paper Structure

This paper contains 20 sections, 4 figures.

Figures (4)

  • Figure 1: As societal priorities shift (vertical dashed line), instead of optimizing for the original reward function (gray, aiding the personal and professional development of students), the deployed system instead optimizes for the new reward function (blue, economic payback). We end up with a gap in between the intended and actual outcomes (double-headed arrow).
  • Figure 2: Illustration of how a Proxy Gaming phenomenon might play out. While it might look like the systems to be deployed do comply with the regulatory requirements, it actually focuses on easy cases to do so. Nevertheless, in reality, the system would actually perform much worse when exposed to real-world cases. It would not be noticeable until the system has already been deployed, due to a regulatory gap.
  • Figure 3: Example of how Reward Hacking might actually be detrimental for a specified set of objectives. In this case, the true objective (blue line) is to hire applicants based on merits and to do so in the most ethically sound way possible. However, what gets actually optimized is the use of buzzwords and pre-defined measures to avoid discriminatory practices.
  • Figure 4: Illustration of a case in which the law’s objectives are not achieved due to Specification Gaming. Economic incentives might make this regulation ineffective, as institutions could deploy AI systems to be effective based on the wrong factors. Later on, further inequality and unfairness are reinforced and perpetuated, limiting access to essential financial services. Similarly to the Proxy Gaming case, it would not be noticeable until the system has already been deployed, due to a regulatory gap.