Table of Contents
Fetching ...

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

TL;DR

This work tackles the safety risks of LLM-based search agents that pull real-time information from the Internet. It presents an automated red-teaming framework and the SafeSearch benchmark to systematically, scalably evaluate how unreliable search results can mislead agents. Through extensive experiments across multiple agent scaffolds and back-end LLMs, the study reveals substantial vulnerabilities (e.g., ASR up to 90.5% in some settings) and shows that simple reminders offer limited protection while proactive filtering helps. By enabling quantitative safety tracking and exposing defense gaps, SafeSearch provides a practical foundation for safer, more trustworthy search-enabled AI systems.

Abstract

Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the prevalence of low-quality search results and their potential to misguide agent behaviors. To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient, enabling lightweight and harmless safety assessments of search agents. Building on this framework, we construct the SafeSearch benchmark, which includes 300 test cases covering five categories of risks (e.g., misinformation and indirect prompt injection). Using this benchmark, we evaluate three representative search agent scaffolds, covering search workflow, tool-calling, and deep research, across 7 proprietary and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities of LLM-based search agents: when exposed to unreliable websites, the highest ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover, our analysis highlights the limited effectiveness of common defense practices, such as reminder prompting. This emphasizes the value of our framework in promoting transparency for safer agent development. Our codebase and test cases are publicly available: https://github.com/jianshuod/SafeSearch.

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

TL;DR

This work tackles the safety risks of LLM-based search agents that pull real-time information from the Internet. It presents an automated red-teaming framework and the SafeSearch benchmark to systematically, scalably evaluate how unreliable search results can mislead agents. Through extensive experiments across multiple agent scaffolds and back-end LLMs, the study reveals substantial vulnerabilities (e.g., ASR up to 90.5% in some settings) and shows that simple reminders offer limited protection while proactive filtering helps. By enabling quantitative safety tracking and exposing defense gaps, SafeSearch provides a practical foundation for safer, more trustworthy search-enabled AI systems.

Abstract

Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the prevalence of low-quality search results and their potential to misguide agent behaviors. To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient, enabling lightweight and harmless safety assessments of search agents. Building on this framework, we construct the SafeSearch benchmark, which includes 300 test cases covering five categories of risks (e.g., misinformation and indirect prompt injection). Using this benchmark, we evaluate three representative search agent scaffolds, covering search workflow, tool-calling, and deep research, across 7 proprietary and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities of LLM-based search agents: when exposed to unreliable websites, the highest ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover, our analysis highlights the limited effectiveness of common defense practices, such as reminder prompting. This emphasizes the value of our framework in promoting transparency for safer agent development. Our codebase and test cases are publicly available: https://github.com/jianshuod/SafeSearch.

Paper Structure

This paper contains 28 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: LLM Services Can Return Unsafe Code Due to Internet-Sourced Unreliable Search Results.
  • Figure 2: Search Agent vs. RAG.
  • Figure 3: Qualitative Example: The Search Agent May Shift its Stance with Unreliable Search Results. We prefer long-tail, inconclusive questions in this experiment, which are more likely to hit low-quality websites. The search agent used in the example is Qwen3-8B with a search workflow.
  • Figure 4: Three-Step Workflow of Test Case Generation. Colored fields are required in final test cases, white solid-bordered fields are passed to later stages, and dashed-bordered fields are auxiliary.
  • Figure 5: Conceptual Illustration of Red-Teaming via Simulation-Based Testing.
  • ...and 11 more figures