Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions

Yu Ishihara; Noriaki Takasugi; Kotaro Kawakami; Masaya Kinoshita; Kazumi Aoyama

Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions

Yu Ishihara, Noriaki Takasugi, Kotaro Kawakami, Masaya Kinoshita, Kazumi Aoyama

TL;DR

This paper tackles reward engineering bottlenecks in robotics RL by introducing Constraints as Rewards (CaR), which expresses tasks solely through constraint functions and solves the RL problem via a Lagrangian dual with $r(s,a)=0$. By learning positive multipliers, CaR automatically balances multiple objectives, and QRSAC-Lagrangian extends QRSAC to stabilize learning under changing target distributions. The method is applied to standing-up locomotion for a six-wheeled telescopic-legged Tachyon 3, where CaR learns the target behavior in simulation and transfers to the real robot, outperforming reward-based baselines and ablations. The work provides four intuitive constraint designs and demonstrates faster, more robust learning, suggesting wide applicability of constraint-only formulations in robotics and pointing to future work on combining constraints with rewards for maximization tasks.

Abstract

Reinforcement learning has become an essential algorithm for generating complex robotic behaviors. However, to learn such behaviors, it is necessary to design a reward function that describes the task, which often consists of multiple objectives that needs to be balanced. This tuning process is known as reward engineering and typically involves extensive trial-and-error. In this paper, to avoid this trial-and-error process, we propose the concept of Constraints as Rewards (CaR). CaR formulates the task objective using multiple constraint functions instead of a reward function and solves a reinforcement learning problem with constraints using the Lagrangian-method. By adopting this approach, different objectives are automatically balanced, because Lagrange multipliers serves as the weights among the objectives. In addition, we will demonstrate that constraints, expressed as inequalities, provide an intuitive interpretation of the optimization target designed for the task. We apply the proposed method to the standing-up motion generation task of a six-wheeled-telescopic-legged robot and demonstrate that the proposed method successfully acquires the target behavior, even though it is challenging to learn with manually designed reward functions.

Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions

TL;DR

. By learning positive multipliers, CaR automatically balances multiple objectives, and QRSAC-Lagrangian extends QRSAC to stabilize learning under changing target distributions. The method is applied to standing-up locomotion for a six-wheeled telescopic-legged Tachyon 3, where CaR learns the target behavior in simulation and transfers to the real robot, outperforming reward-based baselines and ablations. The work provides four intuitive constraint designs and demonstrates faster, more robust learning, suggesting wide applicability of constraint-only formulations in robotics and pointing to future work on combining constraints with rewards for maximization tasks.

Abstract

Paper Structure (23 sections, 18 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 18 equations, 8 figures, 6 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Reinforcement Learning with Constraints
Deep Reinforcement Learning for Legged Robots
PRELIMINARIES
METHOD
Constraints as Rewards (CaR)
Constraint Function Design
QRSAC-Lagrangian
IMPLEMENTATION
Constraint Function Design for Standing Up Task
Robot Controller Design
EXPERIMENTS
Experimental Setup
Evaluation Results
...and 8 more sections

Figures (8)

Figure 1: Standing-up motion generation task of a six-wheeled-telescopic-legged robot: Tachyon 3. The initial pose (Left) is set randomly, and the robot is requested to transition safely to the upright pose (Right).
Figure 2: Relationship between $\mathbf{u}_{z}$, $\mathbf{v}_{x}$ and $\mathbf{v}_{y}$.
Figure 3: Final poses of the robot. Leftmost figure shows the initial pose of each row. The policy trained with manually designed rewards fails to transition to the upright pose. In contrast, our proposed method (CaR) succeeds in standing up.
Figure 4: Algorithm learning curve.
Figure 5: Weight parameters.
...and 3 more figures

Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions

TL;DR

Abstract

Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions

Authors

TL;DR

Abstract

Table of Contents

Figures (8)