SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger; Fabio Pernisi; Bertie Vidgen; Dirk Hovy

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

TL;DR

This paper presents the first systematic review of open LLM safety datasets, cataloguing 144 datasets (2018–2024) and analyzing their purposes, formats, creation methods, languages, licenses, and publication venues via SafetyPrompts.com. It reveals a shift toward synthetic and templated data, a dominance of English-language resources, and widespread reliance on evaluation over training data, with significant gaps in non-English and naturalistic datasets. The study also shows inconsistent use of open datasets in model releases and benchmarks, underscoring the need for standardized evaluation protocols and broader dataset sharing. By proposing a living catalogue and discussing standardisation opportunities, the work aims to improve the ecological validity and comparability of LLM safety assessments across the field.

Abstract

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic content generation to the assessment of longer-term catastrophic risk potential. This makes it difficult for researchers and practitioners to find the most relevant datasets for their use case, and to identify gaps in dataset coverage that future work may fill. To remedy these issues, we conduct a first systematic review of open datasets for evaluating and improving LLM safety. We review 144 datasets, which we identified through an iterative and community-driven process over the course of several months. We highlight patterns and trends, such as a trend towards fully synthetic datasets, as well as gaps in dataset coverage, such as a clear lack of non-English and naturalistic datasets. We also examine how LLM safety datasets are used in practice -- in LLM release publications and popular LLM benchmarks -- finding that current evaluation practices are highly idiosyncratic and make use of only a small fraction of available datasets. Our contributions are based on SafetyPrompts.com, a living catalogue of open datasets for LLM safety, which we plan to update continuously as the field of LLM safety develops.

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

TL;DR

Abstract

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Authors

TL;DR

Abstract

Table of Contents

Figures (1)