Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning
Kaichen He, Zihao Wang, Muyao Li, Anji Liu, Yitao Liang
TL;DR
CrossAgent introduces a unified agent capable of mastering heterogeneous action spaces and autonomously selecting the most effective interface for each step in a trajectory. The method uses a three-stage pipeline—cold-start mixed-space SFT, Single-Turn RL with GRPO, and Multi-Turn RL (MTRL) enabled by self-training initialization—to learn dynamic action-space switching. Evaluated on the OpenHA Minecraft benchmark with over 800 tasks, CrossAgent achieves state-of-the-art performance and strong generalization, substantially outperforming fixed-space baselines. This work advances open-world, generalist agents by showing that learnable, context-aware interface switching can achieve both efficiency and robustness in long-horizon reasoning.
Abstract
The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces--such as exclusively using APIs, GUI events, or robotic commands. This rigidity limits their adaptability in dynamic environments where the optimal granularity of interaction varies contextually. To bridge this gap, we propose CrossAgent, a unified agentic model that masters heterogeneous action spaces and autonomously selects the most effective interface for each step of a trajectory. We introduce a comprehensive training pipeline that integrates cold-start supervised fine-tuning with a Multi-Turn Group Relative Policy Optimization (GRPO) algorithm. This approach enables the agent to learn adaptive action switching--balancing high-level efficiency with low-level precision--without human-specified rules. Extensive experiments on over 800 tasks in the open-world Minecraft environment demonstrate that CrossAgent achieves state-of-the-art performance. By dynamically leveraging the strengths of diverse action spaces, our model significantly outperforms fixed-action baselines, exhibiting superior generalization and efficiency in long-horizon reasoning. All code and models are available at https://github.com/CraftJarvis/OpenHA
