Task Delay and Energy Consumption Minimization for Low-altitude MEC via Evolutionary Multi-objective Deep Reinforcement Learning
Geng Sun, Weilong Ma, Jiahui Li, Zemin Sun, Jiacheng Wang, Dusit Niyato, Shiwen Mao
TL;DR
The paper tackles the dual objective of minimizing total task delay $f_1$ and UAV energy consumption $f_2$ in a UAV-assisted MEC system tailored for the low-altitude economy. It casts the problem as a multi-objective Markov decision process and introduces an evolutionary multi-objective DRL approach (EMODRL) with a multi-objective target distribution learning (TDL) component, plus a simulated-annealing–based scheduling (SA) to reduce action space. The proposed EMO-TDL-SA framework yields non-dominated Pareto policies and demonstrates superior convergence and trade-off performance against strong baselines in simulations. This approach enables dynamic, Pareto-aware control of UAV trajectory and offloading decisions, offering practical gains for LAE deployments with varying requirements and conditions.
Abstract
The low-altitude economy (LAE), driven by unmanned aerial vehicles (UAVs) and other aircraft, has revolutionized fields such as transportation, agriculture, and environmental monitoring. In the upcoming six-generation (6G) era, UAV-assisted mobile edge computing (MEC) is particularly crucial in challenging environments such as mountainous or disaster-stricken areas. The computation task offloading problem is one of the key issues in UAV-assisted MEC, primarily addressing the trade-off between minimizing the task delay and the energy consumption of the UAV. In this paper, we consider a UAV-assisted MEC system where the UAV carries the edge servers to facilitate task offloading for ground devices (GDs), and formulate a calculation delay and energy consumption multi-objective optimization problem (CDECMOP) to simultaneously improve the performance and reduce the cost of the system. Then, by modeling the formulated problem as a multi-objective Markov decision process (MOMDP), we propose a multi-objective deep reinforcement learning (DRL) algorithm within an evolutionary framework to dynamically adjust the weights and obtain non-dominated policies. Moreover, to ensure stable convergence and improve performance, we incorporate a target distribution learning (TDL) algorithm. Simulation results demonstrate that the proposed algorithm can better balance multiple optimization objectives and obtain superior non-dominated solutions compared to other methods.
