VIGIL: A Reflective Runtime for Self-Healing Agents
Christopher Cruz
TL;DR
This work tackles brittleness in autonomous agent stacks by introducing VIGIL, an out-of-band reflective runtime that supervises a target agent without performing its tasks. VIGIL ingests behavioral logs, maps events to an EmoBank of affective states, and diagnoses behavior using a Roses/Buds/Thorns framework, generating guarded prompt updates and code diffs to improve reliability. A key contribution is the meta-procedural capability: VIGIL not only fixes external agent behavior but also identifies and patches flaws in its own diagnostic pipeline, demonstrating resilience in the presence of toolchain failures. The approach offers a practical path toward runtime governance and self-maintenance for large-scale agent systems, with implications for trust, interpretability, and long-horizon reliability in deployed AI systems.
Abstract
Agentic LLM frameworks promise autonomous behavior via task decomposition, tool use, and iterative planning, but most deployed systems remain brittle. They lack runtime introspection, cannot diagnose their own failure modes, and do not improve over time without human intervention. In practice, many agent stacks degrade into decorated chains of LLM calls with no structural mechanisms for reliability. We present VIGIL (Verifiable Inspection and Guarded Iterative Learning), a reflective runtime that supervises a sibling agent and performs autonomous maintenance rather than task execution. VIGIL ingests behavioral logs, appraises each event into a structured emotional representation, maintains a persistent EmoBank with decay and contextual policies, and derives an RBT diagnosis that sorts recent behavior into strengths, opportunities, and failures. From this analysis, VIGIL generates both guarded prompt updates that preserve core identity semantics and read only code proposals produced by a strategy engine that operates on log evidence and code hotspots. VIGIL functions as a state gated pipeline. Illegal transitions produce explicit errors rather than allowing the LLM to improvise. In a reminder latency case study, VIGIL identified elevated lag, proposed prompt and code repairs, and when its own diagnostic tool failed due to a schema conflict, it surfaced the internal error, produced a fallback diagnosis, and emitted a repair plan. This demonstrates meta level self repair in a deployed agent runtime.
