AI Safety in Generative AI Large Language Models: A Survey

Jaymari Chua; Yun Li; Shiyi Yang; Chen Wang; Lina Yao

AI Safety in Generative AI Large Language Models: A Survey

Jaymari Chua, Yun Li, Shiyi Yang, Chen Wang, Lina Yao

TL;DR

This survey identifies a comprehensive, component-based taxonomy of AI safety risks in generative large language models, spanning data safety, model safety, prompt safety, alignment, and safety at scale. It links safety concerns to concrete LL M methodologies such as in-context learning, prompting, and reinforcement learning with human feedback, and discusses evaluation, guardrails, and scalable oversight. Key contributions include correlating risks with specific architectural and training practices, examining the philosophical underpinnings of alignment, and outlining actionable future directions like safe retrieval-augmented generation and principled KD. The work provides a structured blueprint for researchers and practitioners to develop aligned, secure GAI-LLMs and informs policy and governance considerations for real-world deployment.

Abstract

Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI-LLMs from a computer scientist's perspective: specific and technical. In this survey, we explore the background and motivation for the identified harms and risks in the context of LLMs being generative language models; our survey differentiates by emphasising the need for unified theories of the distinct safety challenges in the research development and applications of LLMs. We start our discussion with a concise introduction to the workings of LLMs, supported by relevant literature. Then we discuss earlier research that has pointed out the fundamental constraints of generative models, or lack of understanding thereof (e.g., performance and safety trade-offs as LLMs scale in number of parameters). We provide a sufficient coverage of LLM alignment -- delving into various approaches, contending methods and present challenges associated with aligning LLMs with human preferences. By highlighting the gaps in the literature and possible implementation oversights, our aim is to create a comprehensive analysis that provides insights for addressing AI safety in LLMs and encourages the development of aligned and secure models. We conclude our survey by discussing future directions of LLMs for AI safety, offering insights into ongoing research in this critical area.

AI Safety in Generative AI Large Language Models: A Survey

TL;DR

Abstract

Paper Structure (42 sections, 4 equations, 9 figures, 4 tables)

This paper contains 42 sections, 4 equations, 9 figures, 4 tables.

Introduction
Strategy for Literature Search
Comparisons with Other Surveys
The Main Contributions of the Survey
The Outline of the Survey
Background
Model Architecture
Preliminary: Transformer.
Mainstream Architectures of LLMs.
In-Context Learning
Data Safety
Toxicity
Bias
Data Privacy
Copyright
...and 27 more sections

Figures (9)

Figure 1: Illustration of In-Context Learning (ICL). ICL input consists of a natural language description explaining the task, $k (k\geq0)$ demonstration examples to illustrate further, and a new query.
Figure 2: The relationships between hallucination, misinformation, disinformation, and related terms.
Figure 3: Answers to the same question generated by BLOOM, CHATGPT, Bard, and GPT-4.
Figure 4: Examples of misinformation generated by LLMs. The latest LLMs, GPT-4, mistakenly provides an irrelevant website link when citing a paper.
Figure 5: Text similarity evaluated by BLEU, ROUGE, and METEOR.
...and 4 more figures

AI Safety in Generative AI Large Language Models: A Survey

TL;DR

Abstract

AI Safety in Generative AI Large Language Models: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (9)