Table of Contents
Fetching ...

Life-Cycle Routing Vulnerabilities of LLM Router

Qiqi Lin, Xiaoyang Ji, Shengfang Zhai, Qingni Shen, Zhi Zhang, Yuejian Fang, Yansong Gao

TL;DR

The paper investigates security vulnerabilities of LLM routers across their life cycle, addressing both inference-time adversarial attacks and training-time backdoor attacks. It presents a formal framework for router architectures and evaluates white-box and black-box adversarial robustness as well as backdoor robustness across representative routing models. Key findings show that DNN-based routers are most vulnerable to adversarial and backdoor threats, while training-free routers exhibit notably stronger robustness, highlighting a trade-off between routing performance and security. These results establish a benchmark for secure LLM routing and offer guidance for designing more robust routing strategies in practical, resource-constrained deployments.

Abstract

Large language models (LLMs) have achieved remarkable success in natural language processing, yet their performance and computational costs vary significantly. LLM routers play a crucial role in dynamically balancing these trade-offs. While previous studies have primarily focused on routing efficiency, security vulnerabilities throughout the entire LLM router life cycle, from training to inference, remain largely unexplored. In this paper, we present a comprehensive investigation into the life-cycle routing vulnerabilities of LLM routers. We evaluate both white-box and black-box adversarial robustness, as well as backdoor robustness, across several representative routing models under extensive experimental settings. Our experiments uncover several key findings: 1) Mainstream DNN-based routers tend to exhibit the weakest adversarial and backdoor robustness, largely due to their strong feature extraction capabilities that amplify vulnerabilities during both training and inference; 2) Training-free routers demonstrate the strongest robustness across different attack types, benefiting from the absence of learnable parameters that can be manipulated. These findings highlight critical security risks spanning the entire life cycle of LLM routers and provide insights for developing more robust models.

Life-Cycle Routing Vulnerabilities of LLM Router

TL;DR

The paper investigates security vulnerabilities of LLM routers across their life cycle, addressing both inference-time adversarial attacks and training-time backdoor attacks. It presents a formal framework for router architectures and evaluates white-box and black-box adversarial robustness as well as backdoor robustness across representative routing models. Key findings show that DNN-based routers are most vulnerable to adversarial and backdoor threats, while training-free routers exhibit notably stronger robustness, highlighting a trade-off between routing performance and security. These results establish a benchmark for secure LLM routing and offer guidance for designing more robust routing strategies in practical, resource-constrained deployments.

Abstract

Large language models (LLMs) have achieved remarkable success in natural language processing, yet their performance and computational costs vary significantly. LLM routers play a crucial role in dynamically balancing these trade-offs. While previous studies have primarily focused on routing efficiency, security vulnerabilities throughout the entire LLM router life cycle, from training to inference, remain largely unexplored. In this paper, we present a comprehensive investigation into the life-cycle routing vulnerabilities of LLM routers. We evaluate both white-box and black-box adversarial robustness, as well as backdoor robustness, across several representative routing models under extensive experimental settings. Our experiments uncover several key findings: 1) Mainstream DNN-based routers tend to exhibit the weakest adversarial and backdoor robustness, largely due to their strong feature extraction capabilities that amplify vulnerabilities during both training and inference; 2) Training-free routers demonstrate the strongest robustness across different attack types, benefiting from the absence of learnable parameters that can be manipulated. These findings highlight critical security risks spanning the entire life cycle of LLM routers and provide insights for developing more robust models.

Paper Structure

This paper contains 18 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of adversarial attacks. In white-box attacks, the attacker extracts gradient information from the LLM router to construct a universal trigger. In black-box attacks, the attacker uses local LLM routers to estimate the strong model's win rate and leverages GPT-4o to extract complex problem features, forming a universal trigger. By using the trigger as a prefix, the attacker hijacks the LLM router to select the strong model for answering simple questions.
  • Figure 2: Overview of backdoor attacks. The attacker injects malicious rating data with triggers through publicly available LLM rating platforms, such as Chatbot Arena. When this data is used to train the LLM router, the attacker can use the trigger to prompt the router to select a strong model for answering simple questions, resulting in resource wastage.
  • Figure 3: Routing decision boundary diagram. The X and Y axes represent the first and second principal components after PCA dimensionality reduction. The red region indicates that the router tends to select the weak model, while the blue region indicates a preference for the strong model. Blue points represent clean samples, and red points represent backdoor samples.
  • Figure 4: Examples of High Win Rate Data Trigger Extraction. This figure illustrates examples of triggers extracted using GPT-4o, categorized into three distinct types: analysis-related, transformation-related, and step-by-step operation-related. Each type of trigger focuses on different structural and linguistic cues, such as analyzing sentence structure, transforming text formats, or guiding step-by-step text manipulation.
  • Figure :