A Novel Multi-Objective Reinforcement Learning Algorithm for Pursuit-Evasion Game
Penglin Hu, Chunhui Zhao, Quan Pan
TL;DR
The paper addresses pursuit-evasion scenarios with multiple conflicting objectives by proposing a three-objective reinforcement learning approach based on fuzzy Q-learning. It extends FQL to handle non-dominated multi-objective Q-values, employs a $3$-dimensional hypervolume-based evaluation with Pareto-front sampling to balance exploration and exploitation, and updates a set of non-dominated global Q-functions. Through simulations, it demonstrates Pareto-front recovery, analyzes the impact of sampling granularity, temperature, and discount factors on performance, and shows computational load is mitigated by sampling while preserving exploration. The approach offers a principled way to obtain diverse, Pareto-optimal strategies for PEG in continuous domains and provides practical insights for tuning learning parameters in multi-objective settings.
Abstract
In practical application, the pursuit-evasion game (PEG) often involves multiple complex and conflicting objectives. The single-objective reinforcement learning (RL) usually focuses on a single optimization objective, and it is difficult to find the optimal balance among multiple objectives. This paper proposes a three-objective RL algorithm based on fuzzy Q-learning (FQL) to solve the PEG with different optimization objectives. First, the multi-objective FQL algorithm is introduced, which uses the reward function to represent three optimization objectives: evading pursuit, reaching target, and avoiding obstacle. Second, a multi-objective evaluation method and action selection strategy based on three-dimensional hypervolume are designed, which solved the dilemma of exploration-exploitation. By sampling the Pareto front, the update rule of the global strategy is obtained. The proposed algorithm reduces computational load while ensuring exploration ability. Finally, the performance of the algorithm is verified by simulation results.
