Table of Contents
Fetching ...

ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers

Shivam Patel, Neharika Jali, Ankur Mallick, Gauri Joshi

TL;DR

ProxRouter addresses robust LLM query routing under outlier shifts by reframing nonparametric routers (clustering and $k$NN) into a proximity-weighted aggregation framework. It introduces minimum-variance priors and an exponential proximity tilt controlled by a tunable parameter $\tau$ to produce weights $w_i(\mathbf{x}) \propto p_i(\mathbf{x})\exp(-\phi_i(\mathbf{x})/\tau)$, improving estimates of model utility $\widehat{U}^{(m)}(\mathbf{x})$ and reducing bias without requiring outlier detection. The authors formalize a unified representation for KM and $k$NN routers and demonstrate substantial outlier generalization gains across 14 LLMs and 10 datasets, with KM-Prox and $k$NN-Prox approaching the AllSee upper bound while preserving inlier performance. Experimental results show notable increases in AUC for outlier settings (e.g., $K$M-Prox achieving up to 75.12% vs 70.68% base; $k$NN-Prox achieving 68.12% vs 63.98% base) and minimal routing overhead, validating practical deployment potential. The framework also provides a mechanism to trigger router retraining based on model-ranking similarity, promoting stable performance with evolving task distributions.

Abstract

Large language model (LLM) query routers are critical to modern AI platforms as they seek to improve efficiency by assigning inference queries to accurate, yet low-cost models. Parametric routers typically use trained neural networks for LLM selection but suffer from retraining and maintenance overheads. Nonparametric routers are training-free, instead estimating LLM accuracy and cost via similarity between encodings of the input query and training set queries. However, like their parametric counterparts, nonparametric routers struggle to generalize to outlier queries, an issue exacerbated by limited diversity in training sets which are costly to expand and difficult to keep current with ever-evolving use cases. We propose ProxRouter, which applies an exponentially tilted aggregation mechanism to balance bias and variance in nonparametric routers, improving their robustness to outliers. Experiments show ProxRouter enhances outlier routing while preserving inlier performance with minimal overhead.

ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers

TL;DR

ProxRouter addresses robust LLM query routing under outlier shifts by reframing nonparametric routers (clustering and NN) into a proximity-weighted aggregation framework. It introduces minimum-variance priors and an exponential proximity tilt controlled by a tunable parameter to produce weights , improving estimates of model utility and reducing bias without requiring outlier detection. The authors formalize a unified representation for KM and NN routers and demonstrate substantial outlier generalization gains across 14 LLMs and 10 datasets, with KM-Prox and NN-Prox approaching the AllSee upper bound while preserving inlier performance. Experimental results show notable increases in AUC for outlier settings (e.g., M-Prox achieving up to 75.12% vs 70.68% base; NN-Prox achieving 68.12% vs 63.98% base) and minimal routing overhead, validating practical deployment potential. The framework also provides a mechanism to trigger router retraining based on model-ranking similarity, promoting stable performance with evolving task distributions.

Abstract

Large language model (LLM) query routers are critical to modern AI platforms as they seek to improve efficiency by assigning inference queries to accurate, yet low-cost models. Parametric routers typically use trained neural networks for LLM selection but suffer from retraining and maintenance overheads. Nonparametric routers are training-free, instead estimating LLM accuracy and cost via similarity between encodings of the input query and training set queries. However, like their parametric counterparts, nonparametric routers struggle to generalize to outlier queries, an issue exacerbated by limited diversity in training sets which are costly to expand and difficult to keep current with ever-evolving use cases. We propose ProxRouter, which applies an exponentially tilted aggregation mechanism to balance bias and variance in nonparametric routers, improving their robustness to outliers. Experiments show ProxRouter enhances outlier routing while preserving inlier performance with minimal overhead.

Paper Structure

This paper contains 53 sections, 30 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: The Base router (a nearest neighbors router for this plot), which is only trained on inlier queries, fails to generalize to outlier tasks at test time, showing upto $15\%$ average accuracy drop for a given cost, relative to the AllSee router trained on both inliers and outliers. Our proposed ProxRouter, although trained only on inlier queries, improves robustness to outliers and achieves a better accuracy-cost trade-off (experimental details in \ref{['sec:experiments']}).
  • Figure 2: High-dimensional query encodings downprojected using t-SNE tsne. Left: colored by task. Right: colored by cluster assignment through $K$Means clustering ($K=16$). Queries from same task occupy compact, localized neighborhoods in encoding space, allowing clustering to recover semantically coherent regions aligned with query types.
  • Figure 3: Comparison of our proximity-weighted $K$Means clustering and $k$NN routers. While $K$Means assigns a test query encoding to the closest cluster of training query encodings to obtain the accuracy and cost estimates, $K$M-Prox takes a proximity-weighted combination of the $K$ clusters' for each test query. Similarly, $k$NN-Prox takes a proximity-weighted combination of the estimates of the $k$ nearest neighbors of the test query.
  • Figure 4: Bias-variance tradeoff governed by proximity based prioritization (see \ref{['sec:experiments']} for details). Increasing $1/\tau$ strengthens proximity weighting and raises variance (and vice versa for smaller $1/\tau$). Routing performance is measured as area under the mean accuracy-cost curve (AUC), normalized by cost range.
  • Figure 5: $K$M-Prox, $K$M-Base and $K$M-AllSee router performance on (left) MedQA, HellaSwag as outlier tasks and (right) LogiQA, CommonSenseQA, BBH-BoolEx as outlier tasks. $K$M-Prox consistently improves routing performance over $K$M-Base without additional training data.
  • ...and 8 more figures