Table of Contents
Fetching ...

GMTRouter: Personalized LLM Router over Multi-turn User Interactions

Encheng Xie, Yihang Sun, Tao Feng, Jiaxuan You

TL;DR

GMTRouter tackles personalized LLM routing by modeling multi-turn user–LLM interactions as a heterogeneous graph and employing a lightweight inductive graph learning framework. It uses four node types (user, LLM, query, response) plus virtual turn nodes to preserve dialogue structure, with a cross-attention predictor to rank LLMs per user-query pair. The approach achieves consistent improvements in accuracy and AUC across four datasets and demonstrates strong generalization to new users with few-shot data, all while remaining computationally efficient. This work highlights the value of structured interaction modeling for scalable, user-aligned LLM deployment and suggests promising directions for few-shot personalization in routing systems.

Abstract

Large Language Model (LLM) routing has demonstrated strong capability in balancing response quality with computational cost. As users exhibit diverse preferences, personalization has attracted increasing attention in LLM routing, since even identical queries may require different models to generate responses tailored to individual needs. However, existing approaches are not fully personalized and often fail to capture the complex interactions between specific users and LLMs. Moreover, user preference data is typically scarce, noisy, and inconsistent in format, which limits the effectiveness of methods that rely solely on user-specific data. To address these challenges, we propose GMTRouter, which represents multi-turn user-LLM interactions as a heterogeneous graph with four node types: user, LLM, query, and response, thereby preserving the rich relational structure of the interaction. Through a tailored message-passing mechanism, GMTRouter learns to capture user preferences from few-shot data within a lightweight inductive graph learning framework, enabling effective personalization. Extensive experiments demonstrate that GMTRouter consistently outperforms strong baselines, achieving 0.9 to 21.6 percent higher accuracy and 0.006 to 0.309 higher AUC across multiple datasets. More importantly, we demonstrate that GMTRouter can adapt to new users and evolving preferences using only few-shot data, without extensive fine-tuning. The code for GMTRouter is publicly available at https://github.com/ulab-uiuc/GMTRouter.

GMTRouter: Personalized LLM Router over Multi-turn User Interactions

TL;DR

GMTRouter tackles personalized LLM routing by modeling multi-turn user–LLM interactions as a heterogeneous graph and employing a lightweight inductive graph learning framework. It uses four node types (user, LLM, query, response) plus virtual turn nodes to preserve dialogue structure, with a cross-attention predictor to rank LLMs per user-query pair. The approach achieves consistent improvements in accuracy and AUC across four datasets and demonstrates strong generalization to new users with few-shot data, all while remaining computationally efficient. This work highlights the value of structured interaction modeling for scalable, user-aligned LLM deployment and suggests promising directions for few-shot personalization in routing systems.

Abstract

Large Language Model (LLM) routing has demonstrated strong capability in balancing response quality with computational cost. As users exhibit diverse preferences, personalization has attracted increasing attention in LLM routing, since even identical queries may require different models to generate responses tailored to individual needs. However, existing approaches are not fully personalized and often fail to capture the complex interactions between specific users and LLMs. Moreover, user preference data is typically scarce, noisy, and inconsistent in format, which limits the effectiveness of methods that rely solely on user-specific data. To address these challenges, we propose GMTRouter, which represents multi-turn user-LLM interactions as a heterogeneous graph with four node types: user, LLM, query, and response, thereby preserving the rich relational structure of the interaction. Through a tailored message-passing mechanism, GMTRouter learns to capture user preferences from few-shot data within a lightweight inductive graph learning framework, enabling effective personalization. Extensive experiments demonstrate that GMTRouter consistently outperforms strong baselines, achieving 0.9 to 21.6 percent higher accuracy and 0.006 to 0.309 higher AUC across multiple datasets. More importantly, we demonstrate that GMTRouter can adapt to new users and evolving preferences using only few-shot data, without extensive fine-tuning. The code for GMTRouter is publicly available at https://github.com/ulab-uiuc/GMTRouter.

Paper Structure

This paper contains 38 sections, 4 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: Multi-turn user-LLM Interaction History Table. Each row captures a multi-turn interaction with associated user feedback. User feedback can take various forms, including ratings, rankings, and ground-truth responses.
  • Figure 2: Significant differences exist in LLM preferences across users. The figure shows a heatmap of win rates for the 10 most popular LLMs across 10 active users in ChatBot Arena. The uneven color intensity within each row visually highlights the pronounced preference differences between users.
  • Figure 3: Overview of GMTRouter. (a) GMTRouter first extracts key entities: users, LLMs, queries, responses and feedback, from the Interaction History Table and encodes their textual information using a PLM. (b) It then organizes these entities into a heterogeneous graph to faithfully model the relational structure of user–LLM interactions. (c) Within a lightweight inductive graph learning framework, GMTRouter learns to capture user preferences from few-shot data.
  • Figure 4: This figure illustrates the impact of the visible data size $k$ on GMTRouter for GSM8K (left) and MMLU (right). The dashed line represents the GraphRouter baseline. As $k$ increases, the performance of our method improves, but it saturates once $k$ reaches 10.
  • Figure 5: This figure illustrates the result comparison between old-user and new-user settings for GSM8K (left) and MMLU (right). The dashed line represents the GraphRouter baseline. The personalized performance under the new-user setting is comparable to that under the old-user setting, highlighting the strong generalization capability of our method.
  • ...and 2 more figures