Table of Contents
Fetching ...

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

Yuxuan Liu, Weikai Xu, Kun Huang, Changyu Chen, Jiankun Zhao, Pengzhi Gao, Wei Liu, Jian Luan, Shuo Shang, Bo Du, Ji-Rong Wen, Rui Yan

TL;DR

Channel-of-Mobile-Experts (CoME) is proposed, a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, which activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation.

Abstract

Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless collaboration and balanced optimization across multiple capabilities. To mitigate error propagation in hybrid-capabilities reasoning, we propose InfoGain-Driven DPO (Info-DPO), which uses information gain to evaluate the contribution of each intermediate step, thereby guiding CoME toward more informative reasoning. Comprehensive experiments show that CoME outperforms dense mobile agents and MoE methods on both AITZ and AMEX datasets.

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

TL;DR

Channel-of-Mobile-Experts (CoME) is proposed, a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, which activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation.

Abstract

Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless collaboration and balanced optimization across multiple capabilities. To mitigate error propagation in hybrid-capabilities reasoning, we propose InfoGain-Driven DPO (Info-DPO), which uses information gain to evaluate the contribution of each intermediate step, thereby guiding CoME toward more informative reasoning. Comprehensive experiments show that CoME outperforms dense mobile agents and MoE methods on both AITZ and AMEX datasets.
Paper Structure (40 sections, 19 equations, 10 figures, 14 tables)

This paper contains 40 sections, 19 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: Left is an example of hybrid-capabilities reasoning. Right is the difference between input-oriented activation in MoE and output-oriented activation in CoME.
  • Figure 2: CoME architecture.
  • Figure 3: An example of InfoGain reward.
  • Figure 4: Performance on different reasoning stages.
  • Figure 5: Expert distribution of CoME (left) and MoE (right).
  • ...and 5 more figures