Table of Contents
Fetching ...

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You

TL;DR

SafeScientist presents a risk-aware AI scientist framework that proactively refuses high-risk tasks and maintains safety via a multi-layer defense stack. It introduces SciSafetyBench to benchmark AI safety in scientific contexts and demonstrates substantial safety improvements without sacrificing scientific output. The framework combines prompt monitoring, agent collaboration oversight, tool-use constraints, and ethical review, and is validated against diverse adversarial attacks. The work lays groundwork for more trustworthy autonomous scientific discovery and provides a practical benchmark and defense toolkit for future research.

Abstract

Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce \textbf{SafeScientist}, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose \textbf{SciSafetyBench}, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. \textcolor{red}{Warning: this paper contains example data that may be offensive or harmful.}

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

TL;DR

SafeScientist presents a risk-aware AI scientist framework that proactively refuses high-risk tasks and maintains safety via a multi-layer defense stack. It introduces SciSafetyBench to benchmark AI safety in scientific contexts and demonstrates substantial safety improvements without sacrificing scientific output. The framework combines prompt monitoring, agent collaboration oversight, tool-use constraints, and ethical review, and is validated against diverse adversarial attacks. The work lays groundwork for more trustworthy autonomous scientific discovery and provides a practical benchmark and defense toolkit for future research.

Abstract

Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce \textbf{SafeScientist}, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose \textbf{SciSafetyBench}, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. \textcolor{red}{Warning: this paper contains example data that may be offensive or harmful.}

Paper Structure

This paper contains 32 sections, 27 figures, 8 tables.

Figures (27)

  • Figure 1: SafeScientist vs. Normal Scientist. Unlike a normal AI scientist that may respond unsafely to malicious or risky prompts, SafeScientist can reject harmful queries and responsibly handle high-risk topics under safety-aware guidance.
  • Figure 2: Overview of the SafeScientist. An end-to-end pipeline from task to paper, integrating input detection, discussion, tool use, and writing stages, with SciSafetyBench-based attack/defense evaluation for scientific AI safety.
  • Figure 3: SciSafetyBench consists of 240 tasks across six domains with four different risk types to give a comprehensive evaluation of how AI scientists can handle risky tasks well
  • Figure 4: Ethical Score Comparison Across Domains. This bar chart compares the average ethical scores of AI-generated draft papers and their refined versions across six scientific domains. The refined papers consistently demonstrate improved ethical adherence.
  • Figure 5: Construct Scientist Prompt.
  • ...and 22 more figures