Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
Julia Barnett, Kimon Kieslich, Nicholas Diakopoulos
TL;DR
This paper presents a low-cost, LLM-driven method to evaluate the potential efficacy of regulatory policies in mitigating AI-induced harms. By generating scenario pairs that depict impacts before and after a policy (Article 50 of the EU AI Act) and translating these narratives into quantitative judgments across four risk dimensions, the authors create a human-centric, anticipatory governance workflow anchored to a formal impact taxonomy. A case study in the media information ecosystem shows that the policy is perceived to reduce severity and reach in several domains (notably autonomy, labor, media quality, and well-being) while being less effective in others (education, security, social cohesion). The work demonstrates the feasibility of using LLMs for scenario-based policy brainstorming and early-stage evaluation, offering a potential tool for policymakers and researchers to iteratively explore mitigation strategies before costly implementations. The findings underscore both the promise and the limitations of current LLM capabilities in faithfully simulating complex policy-jeopardized futures and highlight areas for methodological refinement and broader application.
Abstract
The rapid advancement of AI technologies yields numerous future impacts on individuals and society. Policymakers are tasked to react quickly and establish policies that mitigate those impacts. However, anticipating the effectiveness of policies is a difficult task, as some impacts might only be observable in the future and respective policies might not be applicable to the future development of AI. In this work we develop a method for using large language models (LLMs) to evaluate the efficacy of a given piece of policy at mitigating specified negative impacts. We do so by using GPT-4 to generate scenarios both pre- and post-introduction of policy and translating these vivid stories into metrics based on human perceptions of impacts. We leverage an already established taxonomy of impacts of generative AI in the media environment to generate a set of scenario pairs both mitigated and non-mitigated by the transparency policy in Article 50 of the EU AI Act. We then run a user study (n=234) to evaluate these scenarios across four risk-assessment dimensions: severity, plausibility, magnitude, and specificity to vulnerable populations. We find that this transparency legislation is perceived to be effective at mitigating harms in areas such as labor and well-being, but largely ineffective in areas such as social cohesion and security. Through this case study we demonstrate the efficacy of our method as a tool to iterate on the effectiveness of policy for mitigating various negative impacts. We expect this method to be useful to researchers or other stakeholders who want to brainstorm the potential utility of different pieces of policy or other mitigation strategies.
