Table of Contents
Fetching ...

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li

TL;DR

The survey addresses the dual challenges of LLMs—inherent risks (privacy leakage, hallucination, value misalignment) and malicious use (toxicity, jailbreak)—by proposing a unified four‑phase lifecycle framework for mitigation across data collection/pre-training, fine‑tuning/alignment, prompting/reasoning, and post‑processing/auditing. It aggregates techniques for privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses, organized by lifecycle phase and risk dimension. The paper also highlights open challenges, such as cross‑phase interactions and the need for a multi‑dimensional framework, and outlines future directions including synergistic phase integration, deeper reasoning analysis, and brain‑inspired architectures. Overall, it aims to guide researchers, developers, and policymakers toward building safer, more responsible LLMs with broad real‑world impact.

Abstract

While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

TL;DR

The survey addresses the dual challenges of LLMs—inherent risks (privacy leakage, hallucination, value misalignment) and malicious use (toxicity, jailbreak)—by proposing a unified four‑phase lifecycle framework for mitigation across data collection/pre-training, fine‑tuning/alignment, prompting/reasoning, and post‑processing/auditing. It aggregates techniques for privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses, organized by lifecycle phase and risk dimension. The paper also highlights open challenges, such as cross‑phase interactions and the need for a multi‑dimensional framework, and outlines future directions including synergistic phase integration, deeper reasoning analysis, and brain‑inspired architectures. Overall, it aims to guide researchers, developers, and policymakers toward building safer, more responsible LLMs with broad real‑world impact.

Abstract

While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
Paper Structure (26 sections, 5 figures, 9 tables)

This paper contains 26 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Framework of this survey. Five major dimensions of LLM responsibility are involved, divided into inherent risk (privacy, hallucination, value), and malicious use (toxicity and jailbreak). In terms of the mitigation strategies, we divide the whole process of developing and utilizing LLMs into four intervention phases.
  • Figure 2: Overview of the attack and defense methods for evaluating and improving LLMs in terms of privacy.
  • Figure 3: Potential causes of hallucinations in LLMs.
  • Figure 4: Overview of methods of value alignment for LLMs.
  • Figure 5: Illustration of jailbreak attack methods in different phases, where the post-processing and auditing phase is missing, since users do not interact directly with the LLMs in this phase.