Table of Contents
Fetching ...

Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control

Jielong Yang, Daoyuan Huang

TL;DR

RFRLF addresses reward design challenges in vehicle control by learning policies through state-based supervision instead of explicit rewards. It couples a Target State Prediction Network (TSPN) that forecasts $s_{t+1}^{pre}$ from $(s_t,a_t)$ with a Reward-Free State-Guided Policy Network (RFSGPN) trained to minimize $(s_{t+1}^{pre}-s_{t+1}^{exp})^2$, using expert states as supervision. It is evaluated in Carla and Autocar simulations and on a real TurboPi track, where it outperforms several reward-based and reward-free baselines and demonstrates stable deployment without reward signals. The work provides a practical reward-free RL framework for vehicle control and suggests future extensions to multi-modal sensing and multi-agent coordination.

Abstract

Reinforcement learning plays a crucial role in vehicle control by guiding agents to learn optimal control strategies through designing or learning appropriate reward signals. However, in vehicle control applications, rewards typically need to be manually designed while considering multiple implicit factors, which easily introduces human biases. Although imitation learning methods does not rely on explicit reward signals, they necessitate high-quality expert actions, which are often challenging to acquire. To address these issues, we propose a reward-free reinforcement learning framework (RFRLF). This framework directly learns the target states to optimize agent behavior through a target state prediction network (TSPN) and a reward-free state-guided policy network (RFSGPN), avoiding the dependence on manually designed reward signals. Specifically, the policy network is learned via minimizing the differences between the predicted state and the expert state. Experimental results demonstrate the effectiveness of the proposed RFRLF in controlling vehicle driving, showing its advantages in improving learning efficiency and adapting to reward-free environments.

Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control

TL;DR

RFRLF addresses reward design challenges in vehicle control by learning policies through state-based supervision instead of explicit rewards. It couples a Target State Prediction Network (TSPN) that forecasts from with a Reward-Free State-Guided Policy Network (RFSGPN) trained to minimize , using expert states as supervision. It is evaluated in Carla and Autocar simulations and on a real TurboPi track, where it outperforms several reward-based and reward-free baselines and demonstrates stable deployment without reward signals. The work provides a practical reward-free RL framework for vehicle control and suggests future extensions to multi-modal sensing and multi-agent coordination.

Abstract

Reinforcement learning plays a crucial role in vehicle control by guiding agents to learn optimal control strategies through designing or learning appropriate reward signals. However, in vehicle control applications, rewards typically need to be manually designed while considering multiple implicit factors, which easily introduces human biases. Although imitation learning methods does not rely on explicit reward signals, they necessitate high-quality expert actions, which are often challenging to acquire. To address these issues, we propose a reward-free reinforcement learning framework (RFRLF). This framework directly learns the target states to optimize agent behavior through a target state prediction network (TSPN) and a reward-free state-guided policy network (RFSGPN), avoiding the dependence on manually designed reward signals. Specifically, the policy network is learned via minimizing the differences between the predicted state and the expert state. Experimental results demonstrate the effectiveness of the proposed RFRLF in controlling vehicle driving, showing its advantages in improving learning efficiency and adapting to reward-free environments.

Paper Structure

This paper contains 21 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The architecture of our reward-free reinforcement learning framework, which consists of two parts. In the first part, we collect state-action pairs and train a target state prediction network. In the second part, we train the reward-free state-guided policy network using the predicted states provided by the target state prediction network along with the state data provided by the expert, while freezing the target state prediction network.
  • Figure 2: The structure of the target state prediction network (TSPN). In these networks, the input state is first processed by the input layer and then passed through the feature extraction layer to extract key state features. Further, the action information is fused into these state features through the action injection layer. Finally, through the feature reconstruction and the spatial decoding layer, the network generates a prediction of the target state.
  • Figure 3: Test results of different methods on the Carla environment.
  • Figure 4: A sequence diagram of state-action pairs (time from left to right) for some vehicles when turns left, turns right, and goes straight and a map of the environment. On the left side of the figure, the upper row shows the first-person perspective (i.e., states) of the vehicle during driving, and the lower row shows the actions taken by the vehicle in these states. On the right side of the figure, a complete map of the driving environment is shown, showing the approximate position and path of the vehicle in the simulated environment.
  • Figure 5: The central area is a 200×150 cm autonomous driving track, with the starting point and endpoint marked. Surrounding the track are top-down views of the vehicle driving at different positions, corresponding to the blue marking points on the track.
  • ...and 1 more figures