Attacks on Third-Party APIs of Large Language Models

Wanru Zhao; Vidit Khazanchi; Haodi Xing; Xuanli He; Qiongkai Xu; Nicholas Donald Lane

Attacks on Third-Party APIs of Large Language Models

Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane

TL;DR

Large language models increasingly rely on third-party APIs via plugins, expanding capabilities but enlarging the attack surface. The authors introduce an attacking framework and apply it to WeatherAPI, MediaWikiAPI, and NewsAPI interactions with GPT-3.5-turbo and Gemini, demonstrating insertion, deletion, and substitution attacks that can subtly distort outputs. They quantify attack success with ASR metrics across models and APIs, showing higher vulnerability for substitution and deletion in several cases, and noting Gemini's greater susceptibility. The work highlights practical security risks in LLM ecosystems and motivates defense strategies for robust plugin-hosted functionality, with code released for reproducibility.

Abstract

Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.

Attacks on Third-Party APIs of Large Language Models

TL;DR

Abstract

Attacks on Third-Party APIs of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (1)