Table of Contents
Fetching ...

Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization

Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu

TL;DR

This work tackles reinforcement learning under sparse action budgets by formulating Sparse-Action MDP (SA-MDP) and proposing Action Sparsity REgularization (ASRE). ASRE first assesses action sparsity through constrained action sampling using a D-UCB-based selection, then learns a policy that is regularized toward a sparsity distribution $ ilde{p}$ via a KL penalty, under a regularized Bellman operator. The authors prove monotonicity and contraction for the operator and bound the regularized value difference to the true optimum, establishing theoretical validity. Empirically, ASRE improves sample efficiency and final performance on sparse-action tasks (Stock, Gunplay, Football) and generalizes to Atari games, demonstrating broad applicability. Limitations include training stability concerns and applicability primarily to discrete actions, with future work aimed at extending to continuous actions and improved off-policy sampling.

Abstract

Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. However, classic RL methods typically overlook the challenges posed by such sparse-executing actions. They operate under the assumption that all actions can be taken for a unlimited number of times, both in the formulation of the problem and in the development of effective algorithms. To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. Then, we propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference. ASRE operates through two steps: First, ASRE evaluates action sparsity by constrained action sampling. Following this, ASRE incorporates the sparsity evaluation into policy learning by way of an action distribution regularization. We provide theoretical identification that validates the convergence of ASRE to a regularized optimal value function. Experiments on tasks with known sparse-executing actions, where classical RL algorithms struggle to train policy efficiently, ASRE effectively constrains the action sampling and outperforms baselines. Moreover, we present that ASRE can generally improve the performance in Atari games, demonstrating its broad applicability.

Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization

TL;DR

This work tackles reinforcement learning under sparse action budgets by formulating Sparse-Action MDP (SA-MDP) and proposing Action Sparsity REgularization (ASRE). ASRE first assesses action sparsity through constrained action sampling using a D-UCB-based selection, then learns a policy that is regularized toward a sparsity distribution via a KL penalty, under a regularized Bellman operator. The authors prove monotonicity and contraction for the operator and bound the regularized value difference to the true optimum, establishing theoretical validity. Empirically, ASRE improves sample efficiency and final performance on sparse-action tasks (Stock, Gunplay, Football) and generalizes to Atari games, demonstrating broad applicability. Limitations include training stability concerns and applicability primarily to discrete actions, with future work aimed at extending to continuous actions and improved off-policy sampling.

Abstract

Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. However, classic RL methods typically overlook the challenges posed by such sparse-executing actions. They operate under the assumption that all actions can be taken for a unlimited number of times, both in the formulation of the problem and in the development of effective algorithms. To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. Then, we propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference. ASRE operates through two steps: First, ASRE evaluates action sparsity by constrained action sampling. Following this, ASRE incorporates the sparsity evaluation into policy learning by way of an action distribution regularization. We provide theoretical identification that validates the convergence of ASRE to a regularized optimal value function. Experiments on tasks with known sparse-executing actions, where classical RL algorithms struggle to train policy efficiently, ASRE effectively constrains the action sampling and outperforms baselines. Moreover, we present that ASRE can generally improve the performance in Atari games, demonstrating its broad applicability.

Paper Structure

This paper contains 26 sections, 4 theorems, 36 equations, 13 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Given regularized optimal value function $Q_\Omega^* (s, a)$, the regularized optimal policy $\pi_{\Omega}^*$ can be obtained by

Figures (13)

  • Figure 1: We investigate decision-making with sparse action, where agents must execute some actions sparsely due to restricted budgets or chances. Two examples of sparse action tasks are illustrated in the figures. Left: In the BattleCity game, the tanks fire sparingly due to restricted missile amounts. Right: In a football match, players repeatedly pass, dribble, and move before shooting.
  • Figure 2: Overall workflow of ASRE comprises two parts: sparsity evaluation when exploring in the environment and sparsity regularization when training the policy.
  • Figure 3: Visualization of three sparse action tasks used in our experiments. (a) Stock: The agent buys/sells stock to earn profit. (b) Gunplay: The agent moves to the right and left while shooting at a moving target. (c) Football: The agent attempts to score while defended by a defender and goalkeeper.
  • Figure 4: Training curves of different policy optimization algorithms in diverse sparse action tasks. The x-axis represents the number of interaction steps, and the y-axis represents episodic reward. Shaded areas represent standard deviation across five runs.
  • Figure 5: Sparsity evaluation and constraining action sampling during the training process. (a): Sparsity evaluation during the training process. Each row represents the probability of one action in sparsity distribution. (b): Frequency of executing sparse action during the exploration stage. The frequency is calculated as (number of executing sparse actions) / (number of total decision steps). (c): Snapshot of the shooting point of different agents.
  • ...and 8 more figures

Theorems & Definitions (9)

  • Definition 1: Sparse-Action MDP
  • Proposition 1: Regularized Optimal Policy
  • Proposition 2: Regularized Bellman Optimality Operator
  • Proposition 3
  • Proposition 4: Value Discrepancy
  • proof
  • proof
  • proof
  • proof