Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Zhepeng Cen; Yihang Yao; Zuxin Liu; Ding Zhao

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao

TL;DR

This work tackles the challenge of safely balancing reward with sparse safety costs in reinforcement learning by introducing Feasibility Consistent Safe RL (FCSRL). It jointly learns a safety-aware latent representation using a Transition Dynamics Consistency objective and a Feasibility Consistency objective, the latter leveraging a smooth feasibility score $F^\ anglepi(s,a)$ and a distributional regression head to improve constraint estimation. Empirical results across vector-state and image-based tasks from Safety Gymnasium show that FCSRL consistently outperforms baselines, especially under stricter safety constraints, by producing embeddings that better capture safety contexts and support safer policy updates. The approach is compatible with standard model-free safe RL algorithms such as PPO-Lag and TD3-Lag, enabling practical deployment to real-world safety-critical domains. Overall, FCSRL advances safe RL by embedding explicit safety awareness into representation learning, improving both safety satisfaction and task performance.

Abstract

In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

TL;DR

and a distributional regression head to improve constraint estimation. Empirical results across vector-state and image-based tasks from Safety Gymnasium show that FCSRL consistently outperforms baselines, especially under stricter safety constraints, by producing embeddings that better capture safety contexts and support safer policy updates. The approach is compatible with standard model-free safe RL algorithms such as PPO-Lag and TD3-Lag, enabling practical deployment to real-world safety-critical domains. Overall, FCSRL advances safe RL by embedding explicit safety awareness into representation learning, improving both safety satisfaction and task performance.

Abstract

Paper Structure (30 sections, 2 theorems, 19 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 2 theorems, 19 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Method
Transition Dynamics Consistency
Feasibility Consistency
Summary of the Proposed Method
Comparison with Value Consistency
Experiment
Tasks
Results on Vector-state Tasks
Baselines
Evaluation Results
Results on Image-based Tasks
Performances with Different Cost Limits
...and 15 more sections

Key Result

Proposition 4.1

If the cost function $c$ is binary and the discount factor $\gamma\to 1$, then $(1-F^\pi(s,a))$ is equal to the probability of every following state-action is safe, i.e., where $\rho$ is the trajectory starting with $(s,a)$ sampled by policy $\pi$.

Figures (8)

Figure 1: The pipeline of feasibility consistent representation learning. There are two main parts in learning objective: (1) the dynamics loss is between the predicted representation $z_t$ and target $z_t^{(m)}$, and (2) the feasibility consistency loss is between $\tilde{f}_t$ predicted from representation $z$ and target feasibility $F^{(m)}$ estimated by Bellman bootstrap.
Figure 2: The landscape of target cost value $V_c^{(m)}(s)$ and target feasibility score $F^{(m)}(s)$ obtained by bootstrap estimation in PointGoal2 task. The X-Y axis means the coordinate of agent when its state is $s$. The values of z-axis have been rescaled. See Appendix \ref{['app:landscape']} for more details.
Figure 3: The converged performances of different representation learning methods based on PPO-Lagrangian (top) and TD3-Lagrangian (bottom). The error bar indicates the standard deviation of 5 seeds. The green dash line in normalized cost plots indicates the constraint threshold.
Figure 4: Training curve of image-based tasks. The black dash line is the cost limit. The shadow region is the standard deviation of 5 seeds.
Figure 5: Comparison of reward and cost performances with different constraint thresholds.
...and 3 more figures

Theorems & Definitions (5)

Proposition 4.1
Definition 4.2: Temporal smoothness
Proposition 4.3
proof
proof

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

TL;DR

Abstract

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)