Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

Ruining Zhang; Haoran Han; Maolong Lv; Qisong Yang; Jian Cheng

Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

Ruining Zhang, Haoran Han, Maolong Lv, Qisong Yang, Jian Cheng

TL;DR

The paper addresses the generalization gap of DRL policy networks when the state space expands beyond training by introducing state space division theory and analyzing intrinsic network properties on a double-integrator. It demonstrates that activation saturation in $ anh$ drives division boundaries toward linearity in distal state space, producing bang-bang-like control and inevitable overshoot, independent of the optimization algorithm used. Through artificial and realistic experiments across multiple RL algorithms, it reveals universal linear division lines, the formation of division strips, and dead zones as fundamental phenomena limiting policy generalization. The findings provide a mechanistic explanation for generalization failure in expanded state spaces and suggest avenues for extending the framework to high-dimensional settings and exploring potential remedies. Overall, the work offers a principled lens to understand and diagnose the intrinsic limitations of policy networks in continuous-control tasks.

Abstract

Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state spaces, using a novel analysis technique known as state division. In contrast to prior approaches that employ state division merely as a post-hoc explanatory tool, our methodology delves into the intrinsic characteristics of DRL policy networks. Specifically, we demonstrate that the expansion of state space induces the activation function $\tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear. Our analysis centers on the paradigm of the double-integrator system, revealing that this gradual shift towards linearity imparts a control behavior reminiscent of bang-bang control. However, the inherent linearity of the division boundary prevents the attainment of an ideal bang-bang control, thereby introducing unavoidable overshooting. Our experimental investigations, employing diverse RL algorithms, establish that this performance phenomenon stems from inherent attributes of the DRL policy network, remaining consistent across various optimization algorithms.

Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

TL;DR

drives division boundaries toward linearity in distal state space, producing bang-bang-like control and inevitable overshoot, independent of the optimization algorithm used. Through artificial and realistic experiments across multiple RL algorithms, it reveals universal linear division lines, the formation of division strips, and dead zones as fundamental phenomena limiting policy generalization. The findings provide a mechanistic explanation for generalization failure in expanded state spaces and suggest avenues for extending the framework to high-dimensional settings and exploring potential remedies. Overall, the work offers a principled lens to understand and diagnose the intrinsic limitations of policy networks in continuous-control tasks.

Abstract

to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear. Our analysis centers on the paradigm of the double-integrator system, revealing that this gradual shift towards linearity imparts a control behavior reminiscent of bang-bang control. However, the inherent linearity of the division boundary prevents the attainment of an ideal bang-bang control, thereby introducing unavoidable overshooting. Our experimental investigations, employing diverse RL algorithms, establish that this performance phenomenon stems from inherent attributes of the DRL policy network, remaining consistent across various optimization algorithms.

Paper Structure (20 sections, 18 equations, 11 figures)

This paper contains 20 sections, 18 equations, 11 figures.

Introduction
Background
Deep Reinforcement Learning
Network Based Control
Double-Integrator
State Space Division
Division Line
Division Strip
Weight Vector Significance
Unavoidable Overshoot
Experiments
Artificially Constructed Examples
Examples Trained with Realistic Conditions
Universality of Linear Division Line
Impact of Linear Division
...and 5 more sections

Figures (11)

Figure 1: Overview of the state space division theory and article structure.
Figure 2: Illustration of $\phi$ among adjacent regions.
Figure 3: (a) State trajectory (red) and state-action pattern of the ideal bang-bang control. (b) Response of $p$, $v$, and $a$ via time under the control law of Eq. (17).
Figure 4: (a) Unit circle divided by the division directions perpendicular to the weight vector. (b) State-action pattern divided by two division lines.
Figure 5: (a) The state trajectory (red) and ideal nonlinear division line (gray). (b) The response of $p$, $v$, and $a$ via time.
...and 6 more figures

Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

TL;DR

Abstract

Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

Authors

TL;DR

Abstract

Table of Contents

Figures (11)