Learning to Select Goals in Automated Planning with Deep-Q Learning
Carlos Núñez-Molina, Juan Fernández-Olivares, Raúl Pérez
TL;DR
This paper addresses real-time constrained automated planning by integrating a subgoal-selection mechanism learned with Deep Q-Learning into a planning-enabled agent. The authors formulate goal selection as a deterministic MDP (M^g) and deploy a CNN to predict the remaining plan length for each candidate subgoal, enabling efficient subgoal choice that is executed by a standard PDDL planner. Empirical results show that the approach (DQP) is substantially more sample-efficient than vanilla Deep Q-Learning, generalizes across GVGAI Boulder Dash levels, and dramatically reduces planning time compared to a state-of-the-art planner, solving all test levels within about 2 seconds. These findings demonstrate the value of combining deliberative planning with learned subgoal selection to achieve fast, scalable, and generalizable intelligent behavior in real-time environments, with potential extensions to uncertain or dynamic settings.
Abstract
In this work we propose a planning and acting architecture endowed with a module which learns to select subgoals with Deep Q-Learning. This allows us to decrease the load of a planner when faced with scenarios with real-time restrictions. We have trained this architecture on a video game environment used as a standard test-bed for intelligent systems applications, testing it on different levels of the same game to evaluate its generalization abilities. We have measured the performance of our approach as more training data is made available, as well as compared it with both a state-of-the-art, classical planner and the standard Deep Q-Learning algorithm. The results obtained show our model performs better than the alternative methods considered, when both plan quality (plan length) and time requirements are taken into account. On the one hand, it is more sample-efficient than standard Deep Q-Learning, and it is able to generalize better across levels. On the other hand, it reduces problem-solving time when compared with a state-of-the-art automated planner, at the expense of obtaining plans with only 9% more actions.
