Table of Contents
Fetching ...

TabAgent: A Framework for Replacing Agentic Generative Components with Tabular-Textual Classifiers

Ido Levy, Eilam Shapira, Yinon Goldshtein, Avi Yaeli, Nir Mashkif, Segev Shlomov

TL;DR

TabAgent is proposed, a framework for replacing generative decision components in closed-set selection tasks with a compact textual-tabular classifier trained on execution traces, establishing a paradigm for learned discriminative replacements of generative bottlenecks in production agent architectures.

Abstract

Agentic systems, AI architectures that autonomously execute multi-step workflows to achieve complex goals, are often built using repeated large language model (LLM) calls for closed-set decision tasks such as routing, shortlisting, gating, and verification. While convenient, this design makes deployments slow and expensive due to cumulative latency and token usage. We propose TabAgent, a framework for replacing generative decision components in closed-set selection tasks with a compact textual-tabular classifier trained on execution traces. TabAgent (i) extracts structured schema, state, and dependency features from trajectories (TabSchema), (ii) augments coverage with schema-aligned synthetic supervision (TabSynth), and (iii) scores candidates with a lightweight classifier (TabHead). On the long-horizon AppWorld benchmark, TabAgent maintains task-level success while eliminating shortlist-time LLM calls, reducing latency by approximately 95% and inference cost by 85-91%. Beyond tool shortlisting, TabAgent generalizes to other agentic decision heads, establishing a paradigm for learned discriminative replacements of generative bottlenecks in production agent architectures.

TabAgent: A Framework for Replacing Agentic Generative Components with Tabular-Textual Classifiers

TL;DR

TabAgent is proposed, a framework for replacing generative decision components in closed-set selection tasks with a compact textual-tabular classifier trained on execution traces, establishing a paradigm for learned discriminative replacements of generative bottlenecks in production agent architectures.

Abstract

Agentic systems, AI architectures that autonomously execute multi-step workflows to achieve complex goals, are often built using repeated large language model (LLM) calls for closed-set decision tasks such as routing, shortlisting, gating, and verification. While convenient, this design makes deployments slow and expensive due to cumulative latency and token usage. We propose TabAgent, a framework for replacing generative decision components in closed-set selection tasks with a compact textual-tabular classifier trained on execution traces. TabAgent (i) extracts structured schema, state, and dependency features from trajectories (TabSchema), (ii) augments coverage with schema-aligned synthetic supervision (TabSynth), and (iii) scores candidates with a lightweight classifier (TabHead). On the long-horizon AppWorld benchmark, TabAgent maintains task-level success while eliminating shortlist-time LLM calls, reducing latency by approximately 95% and inference cost by 85-91%. Beyond tool shortlisting, TabAgent generalizes to other agentic decision heads, establishing a paradigm for learned discriminative replacements of generative bottlenecks in production agent architectures.
Paper Structure (67 sections, 5 equations, 8 figures, 15 tables, 1 algorithm)

This paper contains 67 sections, 5 equations, 8 figures, 15 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) TabSchema compiles trajectory-derived schema, state, and dependency signals into tabular features. (b) TabHead consumes features and candidates to output calibrated probabilities.
  • Figure 2: Runtime–cost trade-off on log–log axes. Marker area encodes macro $P@R$ across five apps. The curve shows the Pareto frontier (non-dominated trade-offs). TabAgent lies in the faster-and-cheaper corner, while GPT-4.1 (API), the SOTA generative reference, attains higher macro $P@R$ at substantially higher cost and latency. We omit DSR-FT to isolate LLM vs. classifier effects
  • Figure 3: TabAgent on CodeAct with task-description as only feature vs. TabAgent on CUGA (TabSchema workflow features) on AZ/GM. Slopegraph for $\mathrm{P}\!@\!\mathrm{R}$, Recall@7, and Recall@9; lines rise left$\to$right, indicating that extracted workflow features dominate task-description-only input.
  • Figure 4: Difficulty-level distribution over all tasks ($N{=}605$).
  • Figure 5: Distribution of tools used per task ($N{=}605$).
  • ...and 3 more figures