Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games
Xionghui Yang, Bozhou Chen, Yunlong Lu, Yongyi Wang, Lingfeng Li, Lanxiao Huang, Lin Liu, Wenjun Wang, Meng Meng, Xia Lin, Wenxin Li
TL;DR
The paper tackles the challenge of deploying powerful MOBA game AI on resource-constrained mobile devices by formulating mobile deployment as a Pareto-optimal multi-objective problem. It introduces a Pareto-guided distillation pipeline that designs a featherweight student architecture, coupled with architecture search and policy distillation, to balance win-rate with latency, energy, memory, and model size. Across HoK 3v3 experiments, the Featherweight Agent achieves a 12.4× faster inference speed and 15.6× energy efficiency improvement while maintaining a 40.32% win rate against the teacher, placing the solution on the empirical Pareto frontier. The work demonstrates a practical, end-to-end approach to compressing large-scale, multi-modal policies for real-time mobile deployment and provides actionable insights for future hardware-aware co-design and broader mobile-domain applications.
Abstract
Recent advances in game AI have demonstrated the feasibility of training agents that surpass top-tier human professionals in complex environments such as Honor of Kings (HoK), a leading mobile multiplayer online battle arena (MOBA) game. However, deploying such powerful agents on mobile devices remains a major challenge. On one hand, the intricate multi-modal state representation and hierarchical action space of HoK demand large, sophisticated policy networks that are inherently difficult to compress into lightweight forms. On the other hand, production deployment requires high-frequency inference under strict energy and latency constraints on mobile platform. To the best of our knowledge, bridging large-scale game AI and practical on-device deployment has not been systematically studied. In this work, we propose a Pareto optimality guided pipeline and design a high-efficiency student architecture search space tailored for mobile execution, enabling systematic exploration of the trade-off between performance and efficiency. Experimental results demonstrate that the distilled model achieves remarkable efficiency, including an $12.4\times$ faster inference speed (under 0.5ms per frame) and a $15.6\times$ improvement in energy efficiency (under 0.5mAh per game), while retaining a 40.32% win rate against the original teacher model.
