Table of Contents
Fetching ...

Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

Nghi D. Q. Bui

Abstract

The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, execute builds, and deploy environments, CLI-based agents offer unprecedented autonomy for long-horizon development tasks. In this paper, we present OPENDEV, an open-source, command-line coding agent engineered specifically for this new paradigm. Effective autonomous assistance requires strict safety controls and highly efficient context management to prevent context bloat and reasoning degradation. OPENDEV overcomes these challenges through a compound AI system architecture with workload-specialized model routing, a dual-agent architecture separating planning from execution, lazy tool discovery, and adaptive context compaction that progressively reduces older observations. Furthermore, it employs an automated memory system to accumulate project-specific knowledge across sessions and counteracts instruction fade-out through event-driven system reminders. By enforcing explicit reasoning phases and prioritizing context efficiency, OPENDEV provides a secure, extensible foundation for terminal-first AI assistance, offering a blueprint for robust autonomous software engineering.

Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

Abstract

The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, execute builds, and deploy environments, CLI-based agents offer unprecedented autonomy for long-horizon development tasks. In this paper, we present OPENDEV, an open-source, command-line coding agent engineered specifically for this new paradigm. Effective autonomous assistance requires strict safety controls and highly efficient context management to prevent context bloat and reasoning degradation. OPENDEV overcomes these challenges through a compound AI system architecture with workload-specialized model routing, a dual-agent architecture separating planning from execution, lazy tool discovery, and adaptive context compaction that progressively reduces older observations. Furthermore, it employs an automated memory system to accumulate project-specific knowledge across sessions and counteracts instruction fade-out through event-driven system reminders. By enforcing explicit reasoning phases and prioritizing context efficiency, OPENDEV provides a secure, extensible foundation for terminal-first AI assistance, offering a blueprint for robust autonomous software engineering.
Paper Structure (218 sections, 19 figures, 9 tables, 1 algorithm)

This paper contains 218 sections, 19 figures, 9 tables, 1 algorithm.

Figures (19)

  • Figure 1: Overview of OpenDev. Work is organized into concurrent sessions, each composed of multiple specialized sub-agents; each agent executes typed workflows (Execution, Thinking, Compaction) that independently bind to a user-configured LLM. This four-level hierarchy (session $\to$ agent $\to$ workflow $\to$ LLM) enables fine-grained model selection, allowing cost, latency, and capability trade-offs to be optimized per workflow.
  • Figure 2: System architecture of OpenDev, organized into four layers: Entry & UI, Agent, Tool & Context, and Persistence. Arrows indicate primary data-flow directions.
  • Figure 3: Defense-in-depth safety architecture. Five independent layers intercept dangerous actions at progressively lower levels of abstraction, from model reasoning (Layer 1) to user-defined scripts (Layer 5). Each layer operates independently; failure of any single layer does not compromise the remaining four.
  • Figure 4: The agent harness architecture: a detailed view of the Agent layer from \ref{['fig:architecture']}. The central ReAct loop (six phases: pre-check and compaction, thinking, self-critique, action, tool execution, post-processing) is surrounded by seven supporting subsystems. User messages enter through a message injection queue (top). The Prompt Composition engine assembles modular sections by priority into the system prompt. The Tool Registry dispatches to specialized handlers, with MCP tools discovered lazily. The Safety System enforces multiple independent layers (approval, dangerous command detection, hooks, stale-read detection, plan mode restrictions, doom loop detection, iteration cap, cooperative cancellation). Context Engineering applies five-stage progressive compaction as the conversation grows. Memory and Session services provide persistent strategy memory (playbook), session storage, and per-step undo via git snapshots. Subagent Orchestration spawns isolated agent instances with filtered tool access for parallel exploration or specialized tasks.
  • Figure 5: Dual-mode operation within the Agent layer (\ref{['fig:agent_harness']}). A user prompt enters the MainAgent, which routes to either Plan Mode (left, read-only) or Normal Mode (right, full access). Plan Mode spawns a Planner subagent that explores the codebase, analyzes patterns, and produces a structured plan for user approval. Upon approval, the system transitions to Normal Mode, where the agent executes the planned steps with full tool access. The user can re-enter Plan Mode at any point if unexpected results require re-planning.
  • ...and 14 more figures