FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet
TL;DR
FuRL tackles sparse-reward RL by leveraging pre-trained Vision-Language Models while addressing reward fuzziness. It introduces reward alignment to fine-tune VLM-based rewards with lightweight projection heads and a contrastive/ ranking loss, plus Relay RL to escape local minima during exploration. The approach yields improved performance on Meta-World MT10 over baselines and remains effective with pixel-based observations, suggesting practical viability for VLM-assisted online RL. By combining alignment and staged exploration, FuRL enhances sample efficiency and robust policy learning in visually grounded tasks.
Abstract
In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.
