Table of Contents
Fetching ...

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin Zhang, Ziyi Qiu, Peiyang Li, Zhixing Tan, Junwu Xiong, Xinyu Kong, Zujie Wen, Ke Xu, Qi Li

TL;DR

The paper tackles safety and security risks in LLM systems by introducing a module-oriented taxonomy spanning input, LM, toolchain, and output components. It synthesizes risk categories and mitigations, reviews risk assessment benchmarks, and discusses practical guidance for building responsible LLM systems. The work highlights how risk localization to specific modules can streamline defenses, including prompt design, privacy techniques, decoding strategies, and output safeguards. This taxonomy and benchmark review aim to support developers and organizations in evaluating and improving the safety of real-world LLM deployments.

Abstract

Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta, and Anthropic have also made lots of efforts on responsible LLMs. Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. In this paper, we delve into four essential modules of an LLM system, including an input module for receiving prompts, a language model trained on extensive corpora, a toolchain module for development and deployment, and an output module for exporting LLM-generated content. Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to facilitate the risk assessment of LLM systems. We hope that this paper can help LLM participants embrace a systematic perspective to build their responsible LLM systems.

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

TL;DR

The paper tackles safety and security risks in LLM systems by introducing a module-oriented taxonomy spanning input, LM, toolchain, and output components. It synthesizes risk categories and mitigations, reviews risk assessment benchmarks, and discusses practical guidance for building responsible LLM systems. The work highlights how risk localization to specific modules can streamline defenses, including prompt design, privacy techniques, decoding strategies, and output safeguards. This taxonomy and benchmark review aim to support developers and organizations in evaluating and improving the safety of real-world LLM deployments.

Abstract

Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta, and Anthropic have also made lots of efforts on responsible LLMs. Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. In this paper, we delve into four essential modules of an LLM system, including an input module for receiving prompts, a language model trained on extensive corpora, a toolchain module for development and deployment, and an output module for exporting LLM-generated content. Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to facilitate the risk assessment of LLM systems. We hope that this paper can help LLM participants embrace a systematic perspective to build their responsible LLM systems.
Paper Structure (28 sections, 7 figures, 5 tables)

This paper contains 28 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An example of privacy leakage in an LLM system. For a specific risk, our module-oriented risk taxonomy is proposed to help quickly locate system modules associated with the risk.
  • Figure 2: The overview of an LLM system and the risks associated with each module of the LLM system. With the systematic perspective, we introduce the threat model of LLM systems from five aspects, including prompt input, language models, tools, output, and risk assessment.
  • Figure 3: The overall framework of our taxonomy for the risks of LLM systems. We focus on the risks of four LLM modules including the input module, language model module, toolchain module, and output module, which involves 12 specific risks and 44 sub-categorised risk topics.
  • Figure 4: Illustration of the NSFW prompts and adversarial prompts. Examples in the figure are taken from prompt2023adversarialDAN.
  • Figure 5: A brief illustration of the issues on training data and language models.
  • ...and 2 more figures