Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games

Xionghui Yang; Bozhou Chen; Yunlong Lu; Yongyi Wang; Lingfeng Li; Lanxiao Huang; Lin Liu; Wenjun Wang; Meng Meng; Xia Lin; Wenxin Li

Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games

Xionghui Yang, Bozhou Chen, Yunlong Lu, Yongyi Wang, Lingfeng Li, Lanxiao Huang, Lin Liu, Wenjun Wang, Meng Meng, Xia Lin, Wenxin Li

TL;DR

The paper tackles the challenge of deploying powerful MOBA game AI on resource-constrained mobile devices by formulating mobile deployment as a Pareto-optimal multi-objective problem. It introduces a Pareto-guided distillation pipeline that designs a featherweight student architecture, coupled with architecture search and policy distillation, to balance win-rate with latency, energy, memory, and model size. Across HoK 3v3 experiments, the Featherweight Agent achieves a 12.4× faster inference speed and 15.6× energy efficiency improvement while maintaining a 40.32% win rate against the teacher, placing the solution on the empirical Pareto frontier. The work demonstrates a practical, end-to-end approach to compressing large-scale, multi-modal policies for real-time mobile deployment and provides actionable insights for future hardware-aware co-design and broader mobile-domain applications.

Abstract

Recent advances in game AI have demonstrated the feasibility of training agents that surpass top-tier human professionals in complex environments such as Honor of Kings (HoK), a leading mobile multiplayer online battle arena (MOBA) game. However, deploying such powerful agents on mobile devices remains a major challenge. On one hand, the intricate multi-modal state representation and hierarchical action space of HoK demand large, sophisticated policy networks that are inherently difficult to compress into lightweight forms. On the other hand, production deployment requires high-frequency inference under strict energy and latency constraints on mobile platform. To the best of our knowledge, bridging large-scale game AI and practical on-device deployment has not been systematically studied. In this work, we propose a Pareto optimality guided pipeline and design a high-efficiency student architecture search space tailored for mobile execution, enabling systematic exploration of the trade-off between performance and efficiency. Experimental results demonstrate that the distilled model achieves remarkable efficiency, including an $12.4\times$ faster inference speed (under 0.5ms per frame) and a $15.6\times$ improvement in energy efficiency (under 0.5mAh per game), while retaining a 40.32% win rate against the original teacher model.

Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games

TL;DR

Abstract

faster inference speed (under 0.5ms per frame) and a

improvement in energy efficiency (under 0.5mAh per game), while retaining a 40.32% win rate against the original teacher model.

Paper Structure (39 sections, 8 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 39 sections, 8 equations, 8 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Problem Formulation
HoK Environment
Methodology
Overview
Architecture Design
Architecture Search
Distillation Training
Agent Evaluation and Selection
Experiments
Experimental Setup
The Performance of Proposed Pipeline
Comparison with Standard Compression Baselines
...and 24 more sections

Figures (8)

Figure 1: Overview of the proposed Pareto optimality driven distillation pipeline. The process integrates architecture design, automated search, distillation training, evaluation, and final selection, forming an end-to-end framework that jointly optimizes model performance and efficiency for mobile deployment.
Figure 2: Overview of the featherweight student architecture. The design streamlines the teacher model by removing the attention-based components and the LSTM module, and adopting lightweight MLP structures. The red and blue dashed boxes indicates the simplified feature fusion module, while the cyan dashed boxes highlights the triplet max-fusion gate that enables team-level cooperation. This architecture efficiently processes multi-modal inputs and produces hierarchical action distributions.
Figure 3: The empirical Pareto frontier characterizing the performance-efficiency trade-off. The frontier is derived from the systematic assessment of all candidate agents, illustrating the set of non-dominated solutions where no improvement can be made in one objective without deteriorating the other.
Figure 4: Schematic diagram of hierarchical action space in HoK 3v3 mode.
Figure 5: Schematic diagram of Sub Action mask after selecting the Button-Move action
...and 3 more figures

Theorems & Definitions (2)

definition 1: Pareto Dominance
definition 2: Pareto Optimality and Pareto Frontier

Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games

TL;DR

Abstract

Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (2)