Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization
Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu
TL;DR
This work tackles reinforcement learning under sparse action budgets by formulating Sparse-Action MDP (SA-MDP) and proposing Action Sparsity REgularization (ASRE). ASRE first assesses action sparsity through constrained action sampling using a D-UCB-based selection, then learns a policy that is regularized toward a sparsity distribution $ ilde{p}$ via a KL penalty, under a regularized Bellman operator. The authors prove monotonicity and contraction for the operator and bound the regularized value difference to the true optimum, establishing theoretical validity. Empirically, ASRE improves sample efficiency and final performance on sparse-action tasks (Stock, Gunplay, Football) and generalizes to Atari games, demonstrating broad applicability. Limitations include training stability concerns and applicability primarily to discrete actions, with future work aimed at extending to continuous actions and improved off-policy sampling.
Abstract
Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. However, classic RL methods typically overlook the challenges posed by such sparse-executing actions. They operate under the assumption that all actions can be taken for a unlimited number of times, both in the formulation of the problem and in the development of effective algorithms. To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. Then, we propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference. ASRE operates through two steps: First, ASRE evaluates action sparsity by constrained action sampling. Following this, ASRE incorporates the sparsity evaluation into policy learning by way of an action distribution regularization. We provide theoretical identification that validates the convergence of ASRE to a regularized optimal value function. Experiments on tasks with known sparse-executing actions, where classical RL algorithms struggle to train policy efficiently, ASRE effectively constrains the action sampling and outperforms baselines. Moreover, we present that ASRE can generally improve the performance in Atari games, demonstrating its broad applicability.
