Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control
Muhammad Al-Zafar Khan, Jamal Al-Karaki
TL;DR
This work addresses efficient, constraint-aware drone delivery by formulating a multi-agent delivery problem as a Model Predictive Control (MPC) task and benchmarking it against three MARL baselines (IQL, JAL, VDN). The authors develop a receding-horizon MPC framework with horizon length $N$ that minimizes $J = \sum_{i=1}^{n}\sum_{j=1}^{M} c_j \cdot \mathbbm{1}_{ij} + \lambda \cdot n$ under discrete-time dynamics $\mathbf{x}_i(k+1)=\mathbf{A}^{T}\mathbf{x}_i(k)+\mathbf{B}^{T}\mathbf{u}_i(k)$, while enforcing per-building delivery and no-fly airspace constraints. Across two grid-world environments with increasing complexity, MPC consistently achieves faster convergence and requires fewer drones to reach optimality, whereas MARL approaches tend to deliver lower per-buildings costs at the expense of more drones and longer training times. The results highlight MPC’s suitability for real-time, scalable drone delivery under constraints and lay groundwork for integrating advanced MARL techniques in future benchmarking studies.
Abstract
In this study, we formulate the drone delivery problem as a control problem and solve it using Model Predictive Control. Two experiments are performed: The first is on a less challenging grid world environment with lower dimensionality, and the second is with a higher dimensionality and added complexity. The MPC method was benchmarked against three popular Multi-Agent Reinforcement Learning (MARL): Independent $Q$-Learning (IQL), Joint Action Learners (JAL), and Value-Decomposition Networks (VDN). It was shown that the MPC method solved the problem quicker and required fewer optimal numbers of drones to achieve a minimized cost and navigate the optimal path.
