Table of Contents
Fetching ...

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Zecheng Zhang, Han Zheng, Yue Xu

Abstract

Evaluating production LLM responses and routing requests across providers in LLM gateways requires fine-grained quality signals and operationally grounded decisions. To address this gap, we present SEAR, a schema-based evaluation and routing system for multi-model, multi-provider LLM gateways. SEAR defines an extensible relational schema covering both LLM evaluation signals (context, intent, response characteristics, issue attribution, and quality scores) and gateway operational metrics (latency, cost, throughput), with cross-table consistency links across around one hundred typed, SQL-queryable columns. To populate the evaluation signals reliably, SEAR proposes self-contained signal instructions, in-schema reasoning, and multi-stage generation that produces database-ready structured outputs. Because signals are derived through LLM reasoning rather than shallow classifiers, SEAR captures complex request semantics, enables human-interpretable routing explanations, and unifies evaluation and routing in a single query layer. Across thousands of production sessions, SEAR achieves strong signal accuracy on human-labeled data and supports practical routing decisions, including large cost reductions with comparable quality.

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Abstract

Evaluating production LLM responses and routing requests across providers in LLM gateways requires fine-grained quality signals and operationally grounded decisions. To address this gap, we present SEAR, a schema-based evaluation and routing system for multi-model, multi-provider LLM gateways. SEAR defines an extensible relational schema covering both LLM evaluation signals (context, intent, response characteristics, issue attribution, and quality scores) and gateway operational metrics (latency, cost, throughput), with cross-table consistency links across around one hundred typed, SQL-queryable columns. To populate the evaluation signals reliably, SEAR proposes self-contained signal instructions, in-schema reasoning, and multi-stage generation that produces database-ready structured outputs. Because signals are derived through LLM reasoning rather than shallow classifiers, SEAR captures complex request semantics, enables human-interpretable routing explanations, and unifies evaluation and routing in a single query layer. Across thousands of production sessions, SEAR achieves strong signal accuracy on human-labeled data and supports practical routing decisions, including large cost reductions with comparable quality.

Paper Structure

This paper contains 58 sections, 2 equations, 13 figures, 13 tables, 1 algorithm.

Figures (13)

  • Figure 1: SEAR system architecture and database schema. A central gateway routes requests to LLM providers, samples traffic to the SEAR judge for evaluation, and logs operational metrics for all requests. Solid arrows denote mandatory foreign keys and dashed arrows denote optional foreign keys from the gateway metrics table.
  • Figure 2: Each table's structured output call receives the conversation context and all upstream structured table outputs. Dashed arrows indicate input dependencies.
  • Figure 3: Per-table accuracy by signal type (GPT-5-mini, high effort, with in-schema reasoning).
  • Figure 4: Full SEAR database schema including the four semantic evaluation tables and the gateway metrics table. Each table lists all columns with their types.
  • Figure 5: Cross-table violation detection for the tool_call signal family.
  • ...and 8 more figures