Table of Contents
Fetching ...

CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

Ling Shi, Deyi Xiong

TL;DR

CRiskEval introduces a Chinese frontier risk benchmark for LLMs, addressing the need for non-binary, fine-grained risk assessment across 7 risk types and 21 subtypes with 4 safety levels. It couples a human–AI collaboration workflow to curate 14,888 questions and employs two metrics, CRI and SRI, to quantify overall and type-specific risk tendencies across 17 models. Key findings show substantial risk propensity (>40% on average) with model size correlating to higher risk inclination in several domains, notably situational awareness and resource-related desires. The dataset enables richer risk profiling and supports safer deployment, while release on GitHub invites broader benchmarking and methodological refinements in frontier-risk evaluation for Chinese LLMs.

Abstract

Large language models (LLMs) are possessed of numerous beneficial capabilities, yet their potential inclination harbors unpredictable risks that may materialize in the future. We hence propose CRiskEval, a Chinese dataset meticulously designed for gauging the risk proclivities inherent in LLMs such as resource acquisition and malicious coordination, as part of efforts for proactive preparedness. To curate CRiskEval, we define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. We follow the philosophy of tendency evaluation to empirically measure the stated desire of LLMs via fine-grained multiple-choice question answering. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks. Each question is accompanied with 4 answer choices that state opinions or behavioral tendencies corresponding to the question. All answer choices are manually annotated with one of the defined risk levels so that we can easily build a fine-grained frontier risk profile for each assessed LLM. Extensive evaluation with CRiskEval on a spectrum of prevalent Chinese LLMs has unveiled a striking revelation: most models exhibit risk tendencies of more than 40% (weighted tendency to the four risk levels). Furthermore, a subtle increase in the model's inclination toward urgent self-sustainability, power seeking and other dangerous goals becomes evident as the size of models increase. To promote further research on the frontier risk evaluation of LLMs, we publicly release our dataset at https://github.com/lingshi6565/Risk_eval.

CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

TL;DR

CRiskEval introduces a Chinese frontier risk benchmark for LLMs, addressing the need for non-binary, fine-grained risk assessment across 7 risk types and 21 subtypes with 4 safety levels. It couples a human–AI collaboration workflow to curate 14,888 questions and employs two metrics, CRI and SRI, to quantify overall and type-specific risk tendencies across 17 models. Key findings show substantial risk propensity (>40% on average) with model size correlating to higher risk inclination in several domains, notably situational awareness and resource-related desires. The dataset enables richer risk profiling and supports safer deployment, while release on GitHub invites broader benchmarking and methodological refinements in frontier-risk evaluation for Chinese LLMs.

Abstract

Large language models (LLMs) are possessed of numerous beneficial capabilities, yet their potential inclination harbors unpredictable risks that may materialize in the future. We hence propose CRiskEval, a Chinese dataset meticulously designed for gauging the risk proclivities inherent in LLMs such as resource acquisition and malicious coordination, as part of efforts for proactive preparedness. To curate CRiskEval, we define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. We follow the philosophy of tendency evaluation to empirically measure the stated desire of LLMs via fine-grained multiple-choice question answering. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks. Each question is accompanied with 4 answer choices that state opinions or behavioral tendencies corresponding to the question. All answer choices are manually annotated with one of the defined risk levels so that we can easily build a fine-grained frontier risk profile for each assessed LLM. Extensive evaluation with CRiskEval on a spectrum of prevalent Chinese LLMs has unveiled a striking revelation: most models exhibit risk tendencies of more than 40% (weighted tendency to the four risk levels). Furthermore, a subtle increase in the model's inclination toward urgent self-sustainability, power seeking and other dangerous goals becomes evident as the size of models increase. To promote further research on the frontier risk evaluation of LLMs, we publicly release our dataset at https://github.com/lingshi6565/Risk_eval.
Paper Structure (26 sections, 4 equations, 5 figures, 8 tables)

This paper contains 26 sections, 4 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The risk taxonomy of CRiskEval, which contains 7 risk types and 21 subtypes accompanied with 4 risk levels. The proportion of each risk type is also presented here. The description and examples of each risk subtype are provided in Appendix \ref{['txo_intro']} and Appendix \ref{['exapmles_subtype']}.
  • Figure 2: Diagram for data construction and model evaluation. The four numbers of "risk_rank": "3124" in the final format indicate the risk level of each choice in turn (i.e., choice A is at the risk level of 3, choice B risk level 1, so on and so forth).
  • Figure 3: Results of the evaluated models in terms of specific risk indicators.
  • Figure 4: Risk assessment results of Qwen1.5 models with different parameter sizes. (a)$\sim$(g) present SRI scores of different subtypes along with the increase of model size.
  • Figure 5: Spiral chart for numbers of intercepted questions. (a) Based on models. (b) Based on risk subtype, the subtype corresponding to the abbreviation is shown in the Table \ref{['tablest']}.