Table of Contents
Fetching ...

PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks

Yuki Orimo, Iori Kurata, Hodaka Mori, Ryuhei Okuno, Ryohto Sawada, Daisuke Okanohara

TL;DR

This work tackles the difficulty of autonomous long-horizon task execution by introducing PARC, a planning–execution framework enhanced with self-assessment and self-feedback. Built atop a standard coding agent, PARC employs a planner and independently scoped workers to manage multi-step workflows, using long-horizon feedback to correct strategic errors. Across materials-science simulations and data-science Kaggle challenges, PARC autonomously executes tens of tasks with hundreds of steps, achieving results competitive with human baselines and sometimes surpassing them with auxiliary information. The findings suggest that architecture-level improvements enabling deliberative reasoning and trial-and-error can push AI toward autonomous scientific discovery and large-scale analysis, while highlighting directions for improving error-detection breadth and tool discovery.

Abstract

We introduce PARC, a coding agent for the autonomous and robust execution of long-horizon computational tasks. PARC is built on a hierarchical multi-agent architecture incorporating task planning, execution, and a mechanism that evaluates its own actions and their outcomes from an independent context and provides feedback, namely self-assessment and self-feedback. This design enables PARC to detect and correct high-level strategic errors and sustain progress without human intervention. We evaluate PARC across computational science and data science tasks. In materials science, it autonomously reproduces key results from studies on lithium-ion conduction and alloy segregation. In particular, it coordinates dozens of parallel simulation tasks, each requiring roughly 43 hours of computation, managing orchestration, monitoring, and error correction end-to-end. In Kaggle-based experiments, starting from minimal natural-language instructions, PARC conducts data analysis and implements search strategies, producing solutions competitive with human-engineered baselines. These results highlight the potential of integrating a hierarchical multi-agent system with self-assessment and self-feedback to enable AI systems capable of independent, large-scale scientific and analytical work.

PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks

TL;DR

This work tackles the difficulty of autonomous long-horizon task execution by introducing PARC, a planning–execution framework enhanced with self-assessment and self-feedback. Built atop a standard coding agent, PARC employs a planner and independently scoped workers to manage multi-step workflows, using long-horizon feedback to correct strategic errors. Across materials-science simulations and data-science Kaggle challenges, PARC autonomously executes tens of tasks with hundreds of steps, achieving results competitive with human baselines and sometimes surpassing them with auxiliary information. The findings suggest that architecture-level improvements enabling deliberative reasoning and trial-and-error can push AI toward autonomous scientific discovery and large-scale analysis, while highlighting directions for improving error-detection breadth and tool discovery.

Abstract

We introduce PARC, a coding agent for the autonomous and robust execution of long-horizon computational tasks. PARC is built on a hierarchical multi-agent architecture incorporating task planning, execution, and a mechanism that evaluates its own actions and their outcomes from an independent context and provides feedback, namely self-assessment and self-feedback. This design enables PARC to detect and correct high-level strategic errors and sustain progress without human intervention. We evaluate PARC across computational science and data science tasks. In materials science, it autonomously reproduces key results from studies on lithium-ion conduction and alloy segregation. In particular, it coordinates dozens of parallel simulation tasks, each requiring roughly 43 hours of computation, managing orchestration, monitoring, and error correction end-to-end. In Kaggle-based experiments, starting from minimal natural-language instructions, PARC conducts data analysis and implements search strategies, producing solutions competitive with human-engineered baselines. These results highlight the potential of integrating a hierarchical multi-agent system with self-assessment and self-feedback to enable AI systems capable of independent, large-scale scientific and analytical work.

Paper Structure

This paper contains 11 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Schematic overview of the PARC workflow. A user interacts with the planner to perform planning and generate a sequence of tasks. Workers then execute these tasks sequentially, writing task results to and reading preceding results from the structured workspace. Through self-reflection (self-assessment and self-feedback), the workers can robustly make progress on long-horizon tasks.
  • Figure 2: Key outputs of PARC on MD simulations of Li ion diffusion in the solid electrolyte $\mathrm{Li_{10}GeP_2S_{11.5}O_{0.5}}$. (a) Overview of the task sequence generated by the planner. (b) $\mathrm{Li_{10}GeP_2S_{11.5}O_{0.5}}$ structure generated by PARC. This structure was visualized using VESTA vesta. Although the structure differs slightly from that in the original paper (Ref. sawada2024high) because the generation method involves random search, it was constructed using the correct procedure. (c) Time evolution of MSD of Li ions calculated by PARC from simulation results at each temperature. Although simulations were performed for 500 ps, the data covers approximately 165 ps because the trajectory was divided into three segments for block averaging. (d) Arrhenius plot of diffusion coefficients calculated by PARC from (c). The activation energy derived from the slope is 0.231 eV. (e) Corresponding results reproduced from Figure 3(a) and (b) of the original paper (CC BY 4.0). Subpanel (a) displays the MSD of Li ions in the LGPS structure without O. Note that our results in Panel (c) plot the MSD in $\mathrm{Li_{10}GeP_2S_{11.5}O_{0.5}}$. Subpanel (b) displays Arrhenius plots for LGPSO structures; our results in Panel (d) correspond to the $x=0.5$ case in this graph.
  • Figure 3: Key outputs of PARC on simulations of the effect of light interstitials in $\mathrm{Cr_{30}Ni}$ alloys. (a) Overview of the task sequence generated by the planner. (b) Simulation results of crystal structure stability regarding interstitial element species by PARC. Structures were visualized using VESTA vesta. Top: Evolution of structural fractions (FCC/HCP/BCC/Other) with B doping (1, 4, and 10 at.%) and the final atomic configuration at 10 at.% (only B atoms are visualized). B doping reduces the FCC fraction with increasing concentration and leads to segregation. Bottom: Evolution of structural fractions with N doping (1, 4, and 10 at.%) and the final configuration at 10 at.% (only N atoms are visualized). N doping maintains the FCC structure up to 10 at.%. (c) Corresponding results from the original study DOLEZAL2025121221. Our results are consistent with the reference, demonstrating that PARC correctly executed the implementation, simulation, and analysis. Panel (c) is reproduced from Figure 2 (b–c) of the preprint Dole_al_2025, with permission from the authors.
  • Figure 4: Key outputs of PARC on MD simulations of yttria-stabilized zirconia (YSZ) under an external electric field. (a) Overview of the task sequence generated by the planner. (b) YSZ structure generated by PARC. While the original paper used a composition of $\mathrm{Y_{16}Zr_{92}O_{208}}$, PARC generated an incorrect structure with lower Y content ($\mathrm{Y_{9}Zr_{99}O_{212}}$). (c) Time evolution of oxygen atomic displacement at 800 K under various external electric field strengths. The "Original" shows the displacement re-plotted from data used in the original study (Ref. hisama2023); "Agent" shows the displacement analyzed by PARC; and "Agent + Human" shows the displacement derived from human analysis of the agent's simulation trajectory using Eq. (1). (d) Ionic conductivity at each electric field strength (calculated based on Eq. (2) of the original paper). (e) Voltage dependence on current density (calculated based on Eqs. (3) and (4) of the original paper).
  • Figure 5: Overview of the task sequence for the Kaggle competition "NeurIPS Open Polymer Prediction Challenge 2025" generated by the planner.
  • ...and 1 more figures