Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
Xingzhou Lou, Junge Zhang, Ziyan Wang, Kaiqi Huang, Yali Du
TL;DR
This work tackles safe reinforcement learning when constraints come as free-form natural language and ground-truth cost functions are unavailable. It proposes a cost-prediction module built from a decoder LM to condense constraints and an encoder LM to embed constraints and text observations, using a contrastive loss to align semantically similar constraints and a cosine-threshold rule to predict violations. The LM-based costs are integrated into PPO via a Lagrangian objective, enabling agents to maximize rewards while respecting a constraint budget without access to true costs. Empirical results on Hazard-World-Grid and SafetyGoal demonstrate that the method achieves strong task performance with adherence to constraints, and extensive ablations validate the necessity of both encoder and decoder LMs, as well as the contrastive objective. This approach broadens safe RL applicability by leveraging pre-trained LMs to handle diverse, free-form language constraints and reduces the need for domain-specific cost design.
Abstract
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.
