Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control
Jielong Yang, Daoyuan Huang
TL;DR
RFRLF addresses reward design challenges in vehicle control by learning policies through state-based supervision instead of explicit rewards. It couples a Target State Prediction Network (TSPN) that forecasts $s_{t+1}^{pre}$ from $(s_t,a_t)$ with a Reward-Free State-Guided Policy Network (RFSGPN) trained to minimize $(s_{t+1}^{pre}-s_{t+1}^{exp})^2$, using expert states as supervision. It is evaluated in Carla and Autocar simulations and on a real TurboPi track, where it outperforms several reward-based and reward-free baselines and demonstrates stable deployment without reward signals. The work provides a practical reward-free RL framework for vehicle control and suggests future extensions to multi-modal sensing and multi-agent coordination.
Abstract
Reinforcement learning plays a crucial role in vehicle control by guiding agents to learn optimal control strategies through designing or learning appropriate reward signals. However, in vehicle control applications, rewards typically need to be manually designed while considering multiple implicit factors, which easily introduces human biases. Although imitation learning methods does not rely on explicit reward signals, they necessitate high-quality expert actions, which are often challenging to acquire. To address these issues, we propose a reward-free reinforcement learning framework (RFRLF). This framework directly learns the target states to optimize agent behavior through a target state prediction network (TSPN) and a reward-free state-guided policy network (RFSGPN), avoiding the dependence on manually designed reward signals. Specifically, the policy network is learned via minimizing the differences between the predicted state and the expert state. Experimental results demonstrate the effectiveness of the proposed RFRLF in controlling vehicle driving, showing its advantages in improving learning efficiency and adapting to reward-free environments.
