Table of Contents
Fetching ...

Variable Time Step Reinforcement Learning for Robotic Applications

Dong Wang, Giovanni Beltrame

TL;DR

This work tackles the inefficiencies of fixed-frequency reinforcement learning in robotics by introducing MOSEAC, a Multi-Objective Soft Elastic Actor-Critic that integrates action duration into the RL framework and automatically adjusts key hyperparameters during training. The approach extends SAC/SEAC to a variable-time-step setting with a multiplicative task-time-energy reward, and provides theoretical guarantees (contraction, convergence) alongside complexity analysis. Empirically, MOSEAC achieves faster convergence, improved task performance, and reduced energy and compute usage in both simulated and real-world navigation tasks on an AgileX Limo, with robust sim-to-real transfer aided by a Transformer-based environment model. The results indicate MOSEAC’s practical potential for energy-efficient, adaptive control in diverse robotic systems, offering a scalable path to broader deployment of variable-time-step RL algorithms.

Abstract

Traditional reinforcement learning (RL) generates discrete control policies, assigning one action per cycle. These policies are usually implemented as in a fixed-frequency control loop. This rigidity presents challenges as optimal control frequency is task-dependent; suboptimal frequencies increase computational demands and reduce exploration efficiency. Variable Time Step Reinforcement Learning (VTS-RL) addresses these issues with adaptive control frequencies, executing actions only when necessary, thus reducing computational load and extending the action space to include action durations. In this paper we introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method to perform VTS-RL, validating it through theoretical analysis and experimentation in simulation and on real robots. Results show faster convergence, better training results, and reduced energy consumption with respect to other variable- or fixed-frequency approaches.

Variable Time Step Reinforcement Learning for Robotic Applications

TL;DR

This work tackles the inefficiencies of fixed-frequency reinforcement learning in robotics by introducing MOSEAC, a Multi-Objective Soft Elastic Actor-Critic that integrates action duration into the RL framework and automatically adjusts key hyperparameters during training. The approach extends SAC/SEAC to a variable-time-step setting with a multiplicative task-time-energy reward, and provides theoretical guarantees (contraction, convergence) alongside complexity analysis. Empirically, MOSEAC achieves faster convergence, improved task performance, and reduced energy and compute usage in both simulated and real-world navigation tasks on an AgileX Limo, with robust sim-to-real transfer aided by a Transformer-based environment model. The results indicate MOSEAC’s practical potential for energy-efficient, adaptive control in diverse robotic systems, offering a scalable path to broader deployment of variable-time-step RL algorithms.

Abstract

Traditional reinforcement learning (RL) generates discrete control policies, assigning one action per cycle. These policies are usually implemented as in a fixed-frequency control loop. This rigidity presents challenges as optimal control frequency is task-dependent; suboptimal frequencies increase computational demands and reduce exploration efficiency. Variable Time Step Reinforcement Learning (VTS-RL) addresses these issues with adaptive control frequencies, executing actions only when necessary, thus reducing computational load and extending the action space to include action durations. In this paper we introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method to perform VTS-RL, validating it through theoretical analysis and experimentation in simulation and on real robots. Results show faster convergence, better training results, and reduced energy consumption with respect to other variable- or fixed-frequency approaches.
Paper Structure (23 sections, 37 equations, 10 figures, 17 tables, 1 algorithm)

This paper contains 23 sections, 37 equations, 10 figures, 17 tables, 1 algorithm.

Figures (10)

  • Figure 1: The workflow for our MOSEAC implementaion to the Agilex Limo, we use a joystick to control the Limo movement for the initial environmental data collection (top).
  • Figure 2: This photo depicts the real-world environment used to validate the performance of MOSEAC on the Agilex Limo. The cameras on the left, right, and middle stands are three of the four cameras comprising the OptiTrack positioning system.
  • Figure 3: The simulated lidar system generates 20 rays from Limo's position, calculates their intersections with environment enclosed regions, and returns the nearest intersection points for each ray.
  • Figure 4: Average returns of 5 reinforcement learning algorithms over 2.5M steps during training.
  • Figure 5: Average energy costs of 5 reinforcement learning algorithms over 2.5M steps during training.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7