Table of Contents
Fetching ...

CONCUR: A Framework for Continual Constrained and Unconstrained Routing

Peter Baile Chen, Weiyue Li, Dan Roth, Michael Cafarella, Samuel Madden, Jacob Andreas

TL;DR

<CONCUR> tackles continual routing of AI tasks to diverse computation strategies by learning per-strategy predictors for accuracy and cost. It leverages dual input representations—general-purpose and task-specific—to predict performance and orchestrate constrained and unconstrained routing via optimization, enabling easy extension to new strategies without retraining existing predictors. Empirical results across in-distribution and out-of-distribution, reasoning-intensive tasks show CONCUR surpassing the best single-strategy and existing routers, with higher end-to-end accuracy and lower inference cost, plus reduced training cost in continual settings. The approach provides a scalable, budget-aware framework for dynamic model-decoding landscapes in practical AI workloads.

Abstract

AI tasks differ in complexity and are best addressed with different computation strategies (e.g., combinations of models and decoding methods). Hence, an effective routing system that maps tasks to the appropriate strategies is crucial. Most prior methods build the routing framework by training a single model across all strategies, which demands full retraining whenever new strategies appear and leads to high overhead. Attempts at such continual routing, however, often face difficulties with generalization. Prior models also typically use a single input representation, limiting their ability to capture the full complexity of the routing problem and leading to sub-optimal routing decisions. To address these gaps, we propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing (i.e., routing with or without a budget). Our modular design trains a separate predictor model for each strategy, enabling seamless incorporation of new strategies with low additional training cost. Our predictors also leverage multiple representations of both tasks and computation strategies to better capture overall problem complexity. Experiments on both in-distribution and out-of-distribution, knowledge- and reasoning-intensive tasks show that our method outperforms the best single strategy and strong existing routing techniques with higher end-to-end accuracy and lower inference cost in both continual and non-continual settings, while also reducing training cost in the continual setting.

CONCUR: A Framework for Continual Constrained and Unconstrained Routing

TL;DR

<CONCUR> tackles continual routing of AI tasks to diverse computation strategies by learning per-strategy predictors for accuracy and cost. It leverages dual input representations—general-purpose and task-specific—to predict performance and orchestrate constrained and unconstrained routing via optimization, enabling easy extension to new strategies without retraining existing predictors. Empirical results across in-distribution and out-of-distribution, reasoning-intensive tasks show CONCUR surpassing the best single-strategy and existing routers, with higher end-to-end accuracy and lower inference cost, plus reduced training cost in continual settings. The approach provides a scalable, budget-aware framework for dynamic model-decoding landscapes in practical AI workloads.

Abstract

AI tasks differ in complexity and are best addressed with different computation strategies (e.g., combinations of models and decoding methods). Hence, an effective routing system that maps tasks to the appropriate strategies is crucial. Most prior methods build the routing framework by training a single model across all strategies, which demands full retraining whenever new strategies appear and leads to high overhead. Attempts at such continual routing, however, often face difficulties with generalization. Prior models also typically use a single input representation, limiting their ability to capture the full complexity of the routing problem and leading to sub-optimal routing decisions. To address these gaps, we propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing (i.e., routing with or without a budget). Our modular design trains a separate predictor model for each strategy, enabling seamless incorporation of new strategies with low additional training cost. Our predictors also leverage multiple representations of both tasks and computation strategies to better capture overall problem complexity. Experiments on both in-distribution and out-of-distribution, knowledge- and reasoning-intensive tasks show that our method outperforms the best single strategy and strong existing routing techniques with higher end-to-end accuracy and lower inference cost in both continual and non-continual settings, while also reducing training cost in the continual setting.

Paper Structure

This paper contains 29 sections, 9 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: CONCUR learns one predictor per computation strategy that uses multiple input representations to support continual routing and better routing decisions under both continual and non-continual settings.
  • Figure 2: Overall predictor architecture of CONCUR. For each computation strategy $s_j$, we train two predictor models: one estimates the accuracy of applying $s_j$ to the input task, and the other estimates its cost, using both general-purpose and task-specific representations.
  • Figure 3: Pareto curves for unconstrained routing on both in- and out-of-distribution datasets across various values of $w$ defined in \ref{['sec:routing']}, illustrating the trade-off between accuracy and cost. Full diagrams are available in \ref{['app:diagram']}.
  • Figure 4: Performance of different methods under the continual routing with different collections of strategies. XFS denotes method X trained from scratch. XFT(Y%) denotes method X fine-tuned from its prior version, which was trained from scratch in Setting 1, using Y% of the new data.
  • Figure 5: Performance of all methods for unconstrained routing on both in- and out-of-distribution datasets across various values of $w$ defined in \ref{['sec:routing']}, illustrating the trade-off between accuracy and cost.
  • ...and 1 more figures