Table of Contents
Fetching ...

ExploreGen: Large Language Models for Envisioning the Uses and Risks of AI Technologies

Viviane Herdel, Sanja Šćepanović, Edyta Bogucka, Daniele Quercia

TL;DR

This work tackles the challenge of envisioning AI uses and regulatory risks early in development by introducing ExploreGen, an LLM-based framework that generates diverse uses of a technology and assesses risk per the EU AI Act. Using Facial Recognition and Analysis as a case, the authors demonstrate that UsesGen can produce realistic uses, including overlooked ones, and that RiskLabelling achieves high alignment with expert classifications. The framework combines a generation module, a legal-risk classification module, and a literature-mapping filter (OverlookedFilter), and is evaluated through scoping reviews and nine practitioner studies, revealing strong literature coverage and practical utility for ideation and compliance. Despite promising results, the study notes limitations in data biases and inter-rater variability and suggests avenues for broader generalization and deeper, domain-specific analyses to further support responsible AI design across technologies and contexts.

Abstract

Responsible AI design is increasingly seen as an imperative by both AI developers and AI compliance experts. One of the key tasks is envisioning AI technology uses and risks. Recent studies on the model and data cards reveal that AI practitioners struggle with this task due to its inherently challenging nature. Here, we demonstrate that leveraging a Large Language Model (LLM) can support AI practitioners in this task by enabling reflexivity, brainstorming, and deliberation, especially in the early design stages of the AI development process. We developed an LLM framework, ExploreGen, which generates realistic and varied uses of AI technology, including those overlooked by research, and classifies their risk level based on the EU AI Act regulation. We evaluated our framework using the case of Facial Recognition and Analysis technology in nine user studies with 25 AI practitioners. Our findings show that ExploreGen is helpful to both developers and compliance experts. They rated the uses as realistic and their risk classification as accurate (94.5%). Moreover, while unfamiliar with many of the uses, they rated them as having high adoption potential and transformational impact.

ExploreGen: Large Language Models for Envisioning the Uses and Risks of AI Technologies

TL;DR

This work tackles the challenge of envisioning AI uses and regulatory risks early in development by introducing ExploreGen, an LLM-based framework that generates diverse uses of a technology and assesses risk per the EU AI Act. Using Facial Recognition and Analysis as a case, the authors demonstrate that UsesGen can produce realistic uses, including overlooked ones, and that RiskLabelling achieves high alignment with expert classifications. The framework combines a generation module, a legal-risk classification module, and a literature-mapping filter (OverlookedFilter), and is evaluated through scoping reviews and nine practitioner studies, revealing strong literature coverage and practical utility for ideation and compliance. Despite promising results, the study notes limitations in data biases and inter-rater variability and suggests avenues for broader generalization and deeper, domain-specific analyses to further support responsible AI design across technologies and contexts.

Abstract

Responsible AI design is increasingly seen as an imperative by both AI developers and AI compliance experts. One of the key tasks is envisioning AI technology uses and risks. Recent studies on the model and data cards reveal that AI practitioners struggle with this task due to its inherently challenging nature. Here, we demonstrate that leveraging a Large Language Model (LLM) can support AI practitioners in this task by enabling reflexivity, brainstorming, and deliberation, especially in the early design stages of the AI development process. We developed an LLM framework, ExploreGen, which generates realistic and varied uses of AI technology, including those overlooked by research, and classifies their risk level based on the EU AI Act regulation. We evaluated our framework using the case of Facial Recognition and Analysis technology in nine user studies with 25 AI practitioners. Our findings show that ExploreGen is helpful to both developers and compliance experts. They rated the uses as realistic and their risk classification as accurate (94.5%). Moreover, while unfamiliar with many of the uses, they rated them as having high adoption potential and transformational impact.
Paper Structure (21 sections, 6 figures, 3 tables)

This paper contains 21 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our methodology consists of three steps. In the first two steps, ExploreGen performs (i)generation (UsesGen) of various uses for a given AI technology, and their (ii)assessment (RiskLabeling, OverlookedFilter) in terms of the risks based on the EU AI Act, and determining whether they are discussed or overlooked in previous literature. In the last step (iii), we did the evaluation of the generated uses and their risk classification, including the realisticness of the uses, risk assessment accuracy, and usefulness for AI practitioners in envisioning the impacts of AI technology.
  • Figure 2: The scoping review: identification, screening, and assessment for eligibility of articles. Starting with 131 initial papers identified, a total of 97 were included. From these papers, 75 unique FRA uses were identified (Appendix C).
  • Figure 3: Evaluation results for the five quantitative metrics: familiarity with the use, its adoption potential, transformational impact, and perceived riskiness for society as a whole and for the environment.
  • Figure 4: UsesGen. The prompt generates a list of uses for a given AI technology, e.g., FRA. These LLM-generated uses are required to be outputted in the format of 5 risk concepts (domain, purpose, capability, AI user, AI subject) Golpayegani2023Risk. This format allows the subsequent RiskLabelling prompt to evaluate the risk of a given AI technology use.To identify the most comprehensive and realistic list of LLM-generated uses, we examined different UsesGen configurations. These prompt configurations included the model temperature, number of requested uses per domain (2 or 3), and prompt elements (Variation 1-3). Variation 1 of UsesGen encompassed an instruction (A), definitions of risk concepts and the three categories of being realistic (B), and domains (C), that correspond to the necessary elements (Figure \ref{['Fig:configurations']}). In Variation 2, we introduced the system role (D), while in Variation 3, we included an additional five examples (E).
  • Figure 5: RiskLabelling. The prompt evaluates how risky the LLM-generated uses are. Specifically, the objective is to classify the LLM-generated uses of the list into unacceptable risk, high risk, or neither unacceptable nor high risk. The Risk Assessment includes Instructions (A), Relevant Sections of the EU AI Act for what is unacceptable, high risk, and the amendments (i.e., Annex III and its amendments) (B), an LLM-generated use (C), Output Structure (D), and a System Role (E).
  • ...and 1 more figures