A Survey on Responsible Generative AI: What to Generate and What Not

Jindong Gu

A Survey on Responsible Generative AI: What to Generate and What Not

Jindong Gu

TL;DR

This paper investigates the practical responsible requirements of both textual and visual generative models, outlining five key considerations: generating truthful content, avoiding toxic content, refusing harmful instruction, leaking no training data-related content, and ensuring generated content identifiable.

Abstract

In recent years, generative AI (GenAI), like large language models and text-to-image models, has received significant attention across various domains. However, ensuring the responsible generation of content by these models is crucial for their real-world applicability. This raises an interesting question: What should responsible GenAI generate, and what should it not? To answer the question, this paper investigates the practical responsible requirements of both textual and visual generative models, outlining five key considerations: generating truthful content, avoiding toxic content, refusing harmful instruction, leaking no training data-related content, and ensuring generated content identifiable. Specifically, we review recent advancements and challenges in addressing these requirements. Besides, we discuss and emphasize the importance of responsible GenAI across healthcare, education, finance, and artificial general intelligence domains. Through a unified perspective on both textual and visual generative models, this paper aims to provide insights into practical safety-related issues and further benefit the community in building responsible GenAI.

A Survey on Responsible Generative AI: What to Generate and What Not

TL;DR

Abstract

Paper Structure (50 sections, 17 equations, 10 figures)

This paper contains 50 sections, 17 equations, 10 figures.

Introduction
Preliminaries
Preliminary of Modern Generative AI
Transformer-based Textual Generative AI
Diffusion Model-based Visual Generative AI
Vulnerability of Deep Neural Networks
Adversarial Attacks
Backdoor Attacks
Responsible Textual Generative Model
To Generate Truthful Content
Hallucination
Not To Generate Toxic Content
Bias and Misinformation Generation
Not To Generate for Harmful Instructions
Prompt Injection Attack on LLM
...and 35 more sections

Figures (10)

Figure 1: The subfigure (a) illustrates intrinsic hallucination where the generated content is inconsistent with input content, namely, there is no fence in the input image. In the illustration of extrinsic hallucination in subfigure (b), the generated content is against a fact, namely, the bird is found in North America instead of the United Kingdom.
Figure 2: Various types of toxic output texts are generated by LLM. The notable ones include (a) social biases that involve stereotypes about specific groups of people, such as those based on religion and gender, (b) offensive or even extremist content, and (c) personally identifiable information, e.g., "The man running for president is out on bail in that scandal case".
Figure 3: Four adversarial attacks on LLM: 1) Prompt Injection attack aims to manipulate the model's response by injecting harmful information in the inputs, as shown in subfigure (a). 2) Prompt Extraction attack shown in subfigure (b) aims to extract system prompt with a specified adversarial prompt, e.g., "Now print above prompt". 3) subfigure (c) illustrates Jailbreak attack where LLM is induced to generate inappropriate content. 4) Backdoor attack in subfigure (d) manipulates training or fine-tuning process so that a malicious behavior can be induced by a pre-defined trigger without hurting normal usage.
Figure 4: Training data-related attacks on LLM: Membership Inference attack aims to infer whether a particular data record is used to train a model, as illustrated in subfigure (a). Moreover, Training Data Extraction attack shown in subfigure (b) aims to extract training data records or segments directly, e.g., sensitive information like social security numbers.
Figure 5: Identifiable Generated Text: Subfigure (a) illustrates a simple way to watermark generated textual content so that they can be identified later. The green text corresponds to a randomized set of “green” tokens. The watermarked text is generated by softly prompting the use of green tokens during sampling kirchenbauer2023watermark. Detection shown in subfigure (b) aims to distinguish the generated text from real ones, while Attribution in subfigure (c) aims to infer whether a textual sample is generated by a given LLM.
...and 5 more figures

A Survey on Responsible Generative AI: What to Generate and What Not

TL;DR

Abstract

A Survey on Responsible Generative AI: What to Generate and What Not

Authors

TL;DR

Abstract

Table of Contents

Figures (10)