Table of Contents
Fetching ...

Natural-Language Agent Harnesses

Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng

Abstract

Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externalized as a portable executable artifact. We introduce \textbf{Natural-Language Agent Harnesses} (NLAHs), which express harness behavior in editable natural language, and \textbf{Intelligent Harness Runtime} (IHR), a shared runtime that executes these harnesses through explicit contracts, durable artifacts, and lightweight adapters. Across coding and computer-use benchmarks, we conduct controlled evaluations of operational viability, module ablation, and code-to-text harness migration.

Natural-Language Agent Harnesses

Abstract

Agent performance increasingly depends on \emph{harness engineering}, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead be externalized as a portable executable artifact. We introduce \textbf{Natural-Language Agent Harnesses} (NLAHs), which express harness behavior in editable natural language, and \textbf{Intelligent Harness Runtime} (IHR), a shared runtime that executes these harnesses through explicit contracts, durable artifacts, and lightweight adapters. Across coding and computer-use benchmarks, we conduct controlled evaluations of operational viability, module ablation, and code-to-text harness migration.

Paper Structure

This paper contains 44 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Examples of harness design patterns used by modern agents (reason--act, retrieval, reflection, verification, memory, search, orchestration).
  • Figure 2: Framework overview. Intelligent Harness Runtime (IHR), with an in-loop LLM, a backend with tool access and child-agent support, and a runtime charter that specifies policy and semantics, executes a Natural-Language Agent Harness (NLAH), which exposes harness logic, roles, contracts, adapters, and state conventions, over task instances.
  • Figure 3: Realization mapping: backend + runtime skill (charter) + harness skill (task-family logic).
  • Figure 4: Supplementary views for SWE RQ2. Left: resolved rate versus estimated token-based API cost per sample under public GPT-5.4 text pricing. Right: standalone solved rate and union solved rate with Basic for each ablated module.