Table of Contents
Fetching ...

Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach

Takayuki Semitsu, Naoto Kiribuchi, Kengo Zenitani

Abstract

We present an automated crosswalk framework that compares an AI safety policy document pair under a shared taxonomy of activities. Using the activity categories defined in Activity Map on AI Safety as fixed aspects, the system extracts and maps relevant activities, then produces for each aspect a short summary for each document, a brief comparison, and a similarity score. We assess the stability and validity of LLM-based crosswalk analysis across public policy documents. Using five large language models, we perform crosswalks on ten publicly available documents and visualize mean similarity scores with a heatmap. The results show that model choice substantially affects the crosswalk outcomes, and that some document pairs yield high disagreements across models. A human evaluation by three experts on two document pairs shows high inter-annotator agreement, while model scores still differ from human judgments. These findings support comparative inspection of policy documents.

Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach

Abstract

We present an automated crosswalk framework that compares an AI safety policy document pair under a shared taxonomy of activities. Using the activity categories defined in Activity Map on AI Safety as fixed aspects, the system extracts and maps relevant activities, then produces for each aspect a short summary for each document, a brief comparison, and a similarity score. We assess the stability and validity of LLM-based crosswalk analysis across public policy documents. Using five large language models, we perform crosswalks on ten publicly available documents and visualize mean similarity scores with a heatmap. The results show that model choice substantially affects the crosswalk outcomes, and that some document pairs yield high disagreements across models. A human evaluation by three experts on two document pairs shows high inter-annotator agreement, while model scores still differ from human judgments. These findings support comparative inspection of policy documents.

Paper Structure

This paper contains 24 sections, 3 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the crosswalk for policy documents. AI safety policy documents issued by different institutions are compared using a shared taxonomy. For each aspect, the output includes a summary of each document, a comparison, and a similarity score.
  • Figure 2: Heatmap of similarity scores where rows correspond to AI-safety activity items and columns correspond to Document $D_2$ (Document $D_1$ is fixed to UK-AISI). Each value is averaged over results from five models.
  • Figure 3: Heatmap of the standard deviation of similarity scores where rows correspond to AI-safety activity items and columns correspond to Document $D_2$ (Document $D_1$ is fixed to UK-AISI). Lower values indicate closer agreement among the results produced by five AI models.
  • Figure 4: Heatmap comparing crosswalk results produced by pairs of AI models (out of five). Rows correspond to one model and columns to another. Each cell reports the mean absolute difference between the two models' crosswalk results, averaged over 9 document pairs (Document 1 fixed; Document 2 varying across nine documents) and 15 activity items.
  • Figure 5: Mean absolute difference between human annotator scores and LLM-produced crosswalk similarity scores, averaged over the 15 AMAIS activity items and the two document pairs of $(A, D)$ and $(A, E)$.
  • ...and 1 more figures