Table of Contents
Fetching ...

A Survey Analyzing Generalization in Deep Reinforcement Learning

Ezgi Korkmaz

TL;DR

This paper explains the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities, and categorize and explain the manifold solution approaches to increase generalization, and overcome overfitting in deep reinforcement learning policies.

Abstract

Reinforcement learning research obtained significant success and attention with the utilization of deep neural networks to solve problems in high dimensional state or action spaces. While deep reinforcement learning policies are currently being deployed in many different fields from medical applications to large language models, there are still ongoing questions the field is trying to answer on the generalization capabilities of deep reinforcement learning policies. In this paper, we will formalize and analyze generalization in deep reinforcement learning. We will explain the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities. Furthermore, we will categorize and explain the manifold solution approaches to increase generalization, and overcome overfitting in deep reinforcement learning policies. From exploration to adversarial analysis and from regularization to robustness our paper provides an analysis on a wide range of subfields within deep reinforcement learning with a broad scope and in-depth view. We believe our study can provide a compact guideline for the current advancements in deep reinforcement learning, and help to construct robust deep neural policies with higher generalization skills.

A Survey Analyzing Generalization in Deep Reinforcement Learning

TL;DR

This paper explains the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities, and categorize and explain the manifold solution approaches to increase generalization, and overcome overfitting in deep reinforcement learning policies.

Abstract

Reinforcement learning research obtained significant success and attention with the utilization of deep neural networks to solve problems in high dimensional state or action spaces. While deep reinforcement learning policies are currently being deployed in many different fields from medical applications to large language models, there are still ongoing questions the field is trying to answer on the generalization capabilities of deep reinforcement learning policies. In this paper, we will formalize and analyze generalization in deep reinforcement learning. We will explain the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities. Furthermore, we will categorize and explain the manifold solution approaches to increase generalization, and overcome overfitting in deep reinforcement learning policies. From exploration to adversarial analysis and from regularization to robustness our paper provides an analysis on a wide range of subfields within deep reinforcement learning with a broad scope and in-depth view. We believe our study can provide a compact guideline for the current advancements in deep reinforcement learning, and help to construct robust deep neural policies with higher generalization skills.
Paper Structure (22 sections, 20 equations, 4 figures, 5 tables)

This paper contains 22 sections, 20 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Robust adversarial reinforcement learning proposed in pinto17. This paper proposes the zero-sum game to model the relationship between the agent and the adversary while focusing on introducing disturbances to the environment dynamics. Here the empirical studies are conducted in the MuJoCo environment.
  • Figure 2: State transformation generalization under adversarial perspective in the Arcade Learning Environment korkmaz2023aaai. Note that under the adversarial influence direction of research, the state transformation generalization is constrained by the imperceptibility of the transformations. Columns: base frame, shifting, perspective transformation, blurring, discrete cosine transform artifacts, brightness and contrast. Up: JamesBond. Down: BankHeist.
  • Figure 3: Meta training of the learned policy gradient that have been described in oh20. Right: The learned policy gradient algorithm that has been trained in toy examples can generalize to more complex environment such as the Arcade Learning Environment.
  • Figure 4: Transfer in reinforcement learning as has been described in gamrian19 that falls under the generalization through observation category explained in Definition \ref{['def:stateperturbing']}. The frames are taken from Breakout game in the Arcade Learning Environment. The left frames represent the target task and the right frames represents the source tasks generated via generative adversarial networks.

Theorems & Definitions (8)

  • Definition 3.1: Generic reinforcement learning algorithm
  • Definition 3.2: Base generalization
  • Definition 3.3: Algorithmic generalization
  • Definition 3.4: Rewards transforming generalization
  • Definition 3.5: State transforming generalization
  • Definition 3.6: Transition probability transforming generalization
  • Definition 3.7: Policy transforming generalization
  • Definition 3.8: Generalization testing