PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks
Yuki Orimo, Iori Kurata, Hodaka Mori, Ryuhei Okuno, Ryohto Sawada, Daisuke Okanohara
TL;DR
This work tackles the difficulty of autonomous long-horizon task execution by introducing PARC, a planning–execution framework enhanced with self-assessment and self-feedback. Built atop a standard coding agent, PARC employs a planner and independently scoped workers to manage multi-step workflows, using long-horizon feedback to correct strategic errors. Across materials-science simulations and data-science Kaggle challenges, PARC autonomously executes tens of tasks with hundreds of steps, achieving results competitive with human baselines and sometimes surpassing them with auxiliary information. The findings suggest that architecture-level improvements enabling deliberative reasoning and trial-and-error can push AI toward autonomous scientific discovery and large-scale analysis, while highlighting directions for improving error-detection breadth and tool discovery.
Abstract
We introduce PARC, a coding agent for the autonomous and robust execution of long-horizon computational tasks. PARC is built on a hierarchical multi-agent architecture incorporating task planning, execution, and a mechanism that evaluates its own actions and their outcomes from an independent context and provides feedback, namely self-assessment and self-feedback. This design enables PARC to detect and correct high-level strategic errors and sustain progress without human intervention. We evaluate PARC across computational science and data science tasks. In materials science, it autonomously reproduces key results from studies on lithium-ion conduction and alloy segregation. In particular, it coordinates dozens of parallel simulation tasks, each requiring roughly 43 hours of computation, managing orchestration, monitoring, and error correction end-to-end. In Kaggle-based experiments, starting from minimal natural-language instructions, PARC conducts data analysis and implements search strategies, producing solutions competitive with human-engineered baselines. These results highlight the potential of integrating a hierarchical multi-agent system with self-assessment and self-feedback to enable AI systems capable of independent, large-scale scientific and analytical work.
