A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Zihe Liu; Jie Lu; Guangquan Zhang; Junyu Xuan

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Zihe Liu, Jie Lu, Guangquan Zhang, Junyu Xuan

TL;DR

The paper tackles deep reinforcement learning in non-stationary environments with unknown change points by introducing the Behavior-Aware Detection and Adaptation (BADA) framework. BADA detects environment changes by analyzing shifts in policy behavior through behavior embeddings and the Wasserstein distance $W(\cdot,\cdot)$, using a permutation test to determine change points without manually tuned thresholds. It then adapts online via a policy gradient objective that regularizes toward faster deviation from the previous optimum: $F(\theta) = \mathbb{E}_{\tau \sim \mathbb{P}_{\theta}}[\mathcal{R}(\tau)] - W(\mathbb{P}_{\theta},\mathbb{P}_{t-1}) + \delta W(\mathbb{P}_{\theta},\mathbb{P}_{pre})$, with a self-adjusted $\delta$ based on the observed change magnitude. Empirical results in ViZDoom across multiple scenarios show that BADA achieves faster adaptation and higher cumulative rewards and change-detection accuracy than baselines, demonstrating practical robustness in dynamic, unknown-change settings.

Abstract

Deep reinforcement learning is used in various domains, but usually under the assumption that the environment has stationary conditions like transitions and state distributions. When this assumption is not met, performance suffers. For this reason, tracking continuous environmental changes and adapting to unpredictable conditions is challenging yet crucial because it ensures that systems remain reliable and flexible in practical scenarios. Our research introduces Behavior-Aware Detection and Adaptation (BADA), an innovative framework that merges environmental change detection with behavior adaptation. The key inspiration behind our method is that policies exhibit different global behaviors in changing environments. Specifically, environmental changes are identified by analyzing variations between behaviors using Wasserstein distances without manually set thresholds. The model adapts to the new environment through behavior regularization based on the extent of changes. The results of a series of experiments demonstrate better performance relative to several current algorithms. This research also indicates significant potential for tackling this long-standing challenge.

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

TL;DR

, using a permutation test to determine change points without manually tuned thresholds. It then adapts online via a policy gradient objective that regularizes toward faster deviation from the previous optimum:

, with a self-adjusted

based on the observed change magnitude. Empirical results in ViZDoom across multiple scenarios show that BADA achieves faster adaptation and higher cumulative rewards and change-detection accuracy than baselines, demonstrating practical robustness in dynamic, unknown-change settings.

Abstract

Paper Structure (12 sections, 8 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 8 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Change detection in RL.
Methodology
Problem Formulation
Behavior-based Change Detection
Behavior-Aware Adaptation
Experiments and Analysis
Settings
Overall performance
Ablation Study
Conclusion

Figures (8)

Figure 1: When an outdoor robot moves from flat terrain to mountains, its speed, direction, and acceleration control changes corresponding to the changing conditions. We believe these variations can be fully captured through behavior.
Figure 2: This figure presents a t-SNE plot of behavior. The distinct clusters demonstrate the significant impact of environmental changes on behavior and inspire us to use the behavior to adapt actively to coming changes.
Figure 3: The BADA framework. When a change is detected through the behavior distribution permutation test, regularization will be added to deviate policy behavior from the previous optimum.
Figure 4: The simulated non-stationary environments. The left setting is from high-contrast simpler_basic to dimly lit basic scenario, and the right one is from defend_the_line with a rectangular map to defend_the_center with a circular map.
Figure 5: Performance comparison of different methods in non-stationary environments. The vertical dashed lines represent the points of environmental change, and the shaded areas around the reward lines indicate the standard deviation over different runs.
...and 3 more figures

Theorems & Definitions (2)

Remark 1
Remark 2

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

TL;DR

Abstract

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (2)