Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

Jiazheng Sun; Ruimeng Yang; Xu Han; Jiayang Niu; Mingxuan Li; Te Yang; Yongyong Lu; Xin Peng

Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

Jiazheng Sun, Ruimeng Yang, Xu Han, Jiayang Niu, Mingxuan Li, Te Yang, Yongyong Lu, Xin Peng

TL;DR

The paper argues that Agentic AI suffers from lack of Software Engineering rigor, manifested as fragility, opaque internals, and poor long-term adaptability. It introduces a principled SE framework consisting of Runtime Goal Refinement (RGR), Observable Cognitive Architecture (OCA), and Evolutionary Memory Architecture (EMA) to address these issues, and instantiates the framework in Fairy, a mobile GUI agent. Through benchmarks AndroidWorld and RealMobile-Eval, Fairy demonstrates substantial gains in task performance and maintainability, with ablation studies isolating the contributions of each principle. The work contributes a normative blueprint for constructing robust, observable, and evolvable Agentic AI systems and discusses limitations and future paths toward broader applicability and formalization.

Abstract

The Agentic Paradigm faces a significant Software Engineering Absence, yielding Agentic systems commonly lacking robustness, observability, and evolvability. To address these deficiencies, we propose a principled engineering framework comprising Runtime Goal Refinement (RGR), Observable Cognitive Architecture (OCA), and Evolutionary Memory Architecture (EMA). In this framework, RGR ensures robustness and intent alignment via knowledge-constrained refinement and human-in-the-loop clarification; OCA builds an observable and maintainable white-box architecture using component decoupling, logic layering, and state-control separation; and EMA employs an execution-evolution dual-loop for evolvability. We implemented and empirically validated Fairy, a mobile GUI agent based on this framework. On RealMobile-Eval, our novel benchmark for ambiguous and complex tasks, Fairy outperformed the best SoTA baseline in user requirement completion by 33.7%. Subsequent controlled experiments, human-subject studies, and ablation studies further confirmed that the RGR enhances refinement accuracy and prevents intent deviation; the OCA improves maintainability; and the EMA is crucial for long-term performance. This research provides empirically validated specifications and a practical blueprint for building reliable, observable, and evolvable Agentic AI systems.

Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

TL;DR

Abstract

Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)