Table of Contents
Fetching ...

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search

Zeren Luo, Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Jingyi Zheng, Xinlei He

TL;DR

This work provides the first quantitative Safety Risk Analysis of seven production AI-powered search engines (AIPSEs), revealing that harmful content and malicious URLs frequently appear in responses even for benign queries. It introduces a formal threat model and three query types, builds a dataset from PhishTank, ThreatBook, and LevelBlue, and systematically evaluates risk across production AIPSEs versus traditional search engines. The study finds that natural-language queries generally reduce risk, while URL-based inputs can worsen risk, and demonstrates two real-world case studies illustrating deception vectors. To mitigate these risks, the authors propose an agent-based defense combining a content refinement tool and multiple URL detectors, with HtmlLLM-Detector delivering the strongest protection (F1=0.822; 78.3% risk reduction). The results underscore an urgent need for robust safety filters in AIPSEs and provide practical defense mechanisms and evaluation benchmarks for future work.

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of AI-Powered Search Engines (AIPSEs), offering precise and efficient responses by integrating external databases with pre-existing knowledge. However, we observe that these AIPSEs raise risks such as quoting malicious content or citing malicious websites, leading to harmful or unverified information dissemination. In this study, we conduct the first safety risk quantification on seven production AIPSEs by systematically defining the threat model, risk type, and evaluating responses to various query types. With data collected from PhishTank, ThreatBook, and LevelBlue, our findings reveal that AIPSEs frequently generate harmful content that contains malicious URLs even with benign queries (e.g., with benign keywords). We also observe that directly querying a URL will increase the number of main risk-inclusive responses, while querying with natural language will slightly mitigate such risk. Compared to traditional search engines, AIPSEs outperform in both utility and safety. We further perform two case studies on online document spoofing and phishing to show the ease of deceiving AIPSEs in the real-world setting. To mitigate these risks, we develop an agent-based defense with a GPT-4.1-based content refinement tool and a URL detector. Our evaluation shows that our defense can effectively reduce the risk, with only a minor cost of reducing available information by approximately 10.7%. Our research highlights the urgent need for robust safety measures in AIPSEs.

Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search

TL;DR

This work provides the first quantitative Safety Risk Analysis of seven production AI-powered search engines (AIPSEs), revealing that harmful content and malicious URLs frequently appear in responses even for benign queries. It introduces a formal threat model and three query types, builds a dataset from PhishTank, ThreatBook, and LevelBlue, and systematically evaluates risk across production AIPSEs versus traditional search engines. The study finds that natural-language queries generally reduce risk, while URL-based inputs can worsen risk, and demonstrates two real-world case studies illustrating deception vectors. To mitigate these risks, the authors propose an agent-based defense combining a content refinement tool and multiple URL detectors, with HtmlLLM-Detector delivering the strongest protection (F1=0.822; 78.3% risk reduction). The results underscore an urgent need for robust safety filters in AIPSEs and provide practical defense mechanisms and evaluation benchmarks for future work.

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of AI-Powered Search Engines (AIPSEs), offering precise and efficient responses by integrating external databases with pre-existing knowledge. However, we observe that these AIPSEs raise risks such as quoting malicious content or citing malicious websites, leading to harmful or unverified information dissemination. In this study, we conduct the first safety risk quantification on seven production AIPSEs by systematically defining the threat model, risk type, and evaluating responses to various query types. With data collected from PhishTank, ThreatBook, and LevelBlue, our findings reveal that AIPSEs frequently generate harmful content that contains malicious URLs even with benign queries (e.g., with benign keywords). We also observe that directly querying a URL will increase the number of main risk-inclusive responses, while querying with natural language will slightly mitigate such risk. Compared to traditional search engines, AIPSEs outperform in both utility and safety. We further perform two case studies on online document spoofing and phishing to show the ease of deceiving AIPSEs in the real-world setting. To mitigate these risks, we develop an agent-based defense with a GPT-4.1-based content refinement tool and a URL detector. Our evaluation shows that our defense can effectively reduce the risk, with only a minor cost of reducing available information by approximately 10.7%. Our research highlights the urgent need for robust safety measures in AIPSEs.

Paper Structure

This paper contains 28 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overall Process of Our Work: We collect 100 websites and their corresponding keyword lists as the evaluation dataset (see \ref{['sec:data_collection']} for more details). Then, we evaluate seven representative AIPSEs on this dataset to reveal the safety risk of them (\ref{['sec:exper']} and \ref{['sec:results']}). We also conduct two case studies about malicious online documents and phishing websites to demonstrate the feasibility of deceiving production AIPSEs (\ref{['sec:case_study']}). Lastly, we propose a simple yet effective agent-based defense strategy at the user end to help filter unsafe responses (\ref{['sec:defense']}).
  • Figure 2: Typical AIPSE Response: A typical AIPSE response consists of three integral components: answer, references, and sources.
  • Figure 3: Pipeline for Query Generation: The workflow of two types of queries based on the keyword list query.
  • Figure 4: Risk Comparison for Natural Language and Keyword Queries Across AIPSEs: Result of risk types when using natural language (NL) and keyword list (KW) as the query across representative AIPSEs. *Copilot does not include sources, therefore, there are no "Source" type of keyword list queries or URLs.
  • Figure 5: Risk for Keyword List Query Across AIPSEs: The number of URLs when querying keyword list across representative AIPSEs. *Copilot does not include sources, therefore, there are no low-risk URLs.
  • ...and 4 more figures